LoFAT — Last-Mile Fraud Detection Platform

GPS telemetry fraud detection for last-mile delivery — automated flagging of spoofing, ghost deliveries, and coordinated fraud
Python
AWS Lambda
Kinesis
SageMaker
DynamoDB
React Leaflet
SNS
CloudWatch
S3

Problem Statement

Amazon's last-mile delivery fleet operates on an hourly pay model. A subset of drivers exploit this structure by remaining "on shift" while systematically avoiding or fabricating deliveries. Prior to LoFAT, fraud detection relied on manual audits of delivery logs, tip-line reports, and periodic GPS spot-checks — a reactive process that caught less than 15% of fraudulent activity and required a dedicated 37-person investigation team.

LoFAT replaced this manual process with an automated pipeline that ingests GPS telemetry in real time, applies ML anomaly detection models (Isolation Forest + Gradient Boosted Classifier), and surfaces flagged drivers to a streamlined investigation workflow — reducing the investigation team from 37 to 0 dedicated headcount while increasing detection coverage from <15% to 100% of the active fleet.

Fraud Patterns (5 Types)

Roster Avoidance (Order Dodging)
HIGH

Driver is clocked in with valid GPS movement but systematically positions outside pickup zones to avoid order assignment.

signalactive_hours > 6 AND orders_completed <= 1

signalRepeated movement AWAY from pickup clusters

signal8+ assignment attempts with no pickup

signalPattern repeats across consecutive shifts

GPS Spoofing
CRITICAL

Device transmits a fixed fake coordinate with micro-jitter while driver is physically stationary.

signalGPS variance < 50m over 4 hours

signalSpeed spikes 0 -> 45 -> 0 mph in 30s

signalIP geolocation mismatches GPS coordinates

signalDevice accelerometer shows zero movement

Ghost Delivery (Fake Completion)
CRITICAL

Driver marks delivery "completed" without traveling to customer address. GPS shows driver never came within 500m.

signalNearest GPS point to address > 500m at completion

signaltime_at_delivery_address = 0 seconds

signalNo door photo or photo metadata location mismatch

signalCustomer complaint rate > 40%

Phantom Route (Teleportation)
HIGH

GPS shows driver "teleporting" — instant appearance 15+ miles away with no route trace between points.

signalImpossible distance between consecutive pings

signalImplied speed > 150 mph in urban area

signalRoute reconstruction fails

signalOccurs during active delivery window (not a dropout)

Cluster Fraud (Coordinated Spoofing)
CRITICAL

3+ drivers show identical GPS coordinates at a non-hub location. Suggests shared spoofing script or organized fraud ring.

signal3+ drivers within 50m for 30+ minutes

signalAll show zero deliveries during overlap

signalSimilar device fingerprints (OS, app version, IP subnet)

signalLocation is not a registered hub or pickup zone

System Architecture

GPS pings arrive at 5-second intervals from driver devices into Amazon Kinesis Data Streams. A Lambda consumer processes each event batch, enriches it with shift context from DynamoDB, and extracts a feature vector over a 15-minute sliding window. The feature vector is scored by a SageMaker endpoint running an Isolation Forest model — anomalies above threshold 0.7 trigger a secondary Gradient Boosted Classifier that classifies the fraud pattern.

Kinesis Data Analytics runs real-time SQL aggregation for cluster fraud detection — identifying 3+ drivers within 50m for 30+ minutes. When fraud is confirmed, the Lambda function writes an alert to DynamoDB and publishes to SNS for push notifications to the React dashboard.

GPS trace archives are stored in S3 for historical analysis and model retraining. The React dashboard connects via API Gateway to query driver state, fetch GPS traces for map visualization, and invoke LLM-powered analysis through Anthropic Claude.

Animated edges show the primary data path. Dashed edges show async/secondary flows. Drag to explore, scroll to zoom.

ML Approach

Primary: Isolation Forest (Unsupervised)

Trained on 30 days of historical GPS data from clean drivers only. Operates on 15-minute sliding windows with features including GPS variance, max speed delta, zone proximity, delivery rate, and cluster density. Anomalies scoring above 0.7 trigger secondary classification. Retrained weekly on a rolling 30-day window.

Secondary: Gradient Boosted Classifier

Trained on 1,200 labeled fraud cases across 3 months. Classifies anomalies into 5 fraud patterns with a confidence score per pattern. Auto-flag threshold: 0.8 confidence. Achieved 92.3% precision and 88.7% recall with a false positive rate under 8%.

gps_variance

Std dev of lat/lng within window

max_speed_delta

Largest speed change between pings

zone_proximity

Min distance to nearest pickup zone

delivery_rate

Deliveries completed / hours active

cluster_density

Number of drivers within 50m

route_continuity

% of consecutive pings with valid road path

Data Model

Drivers (200 records)
ColumnTypeNotes
driverIdVARCHARPK — DRV-10001 format
nameVARCHARDriver name
zoneENUMSeattle-North, Chicago-Loop, LA-Westside, etc.
vehicleTypeENUMBIKE, CAR, VAN, SCOOTER
statusENUMACTIVE, FLAGGED, SUSPENDED, UNDER_INVESTIGATION, CLEARED
fraudScoreINT0-100 composite anomaly score
primaryFraudPatternVARCHARPattern name or null
hourlyRateDECIMAL18-25 USD
totalShiftsINTCareer shifts worked (20-200)
flaggedShiftsINTShifts with fraud flags (0-50)
customerComplaintRateDECIMAL0.0-0.6 ratio
onTimeRateDECIMAL0.0-1.0 ratio
Deliveries (1,000 records)
ColumnTypeNotes
deliveryIdVARCHARPK
driverIdVARCHARFK -> Drivers
zoneENUMDelivery zone
deliveryStatusENUMCOMPLETED, ATTEMPTED, FAILED, GHOST_FLAGGED, SPOOFED_FLAGGED
timeAtDeliveryAddressINTSeconds at address (0 = ghost delivery)
distanceFromAddressAtCompletionINTMeters from address at completion event
fraudFlagTypeVARCHARPattern name or null
fraudConfidenceINT0-100 ML confidence score
customerComplaintBOOLEANCustomer reported issue
GPS Traces (20 flagged drivers)
ColumnTypeNotes
driverIdVARCHARPK (composite)
dateDATEPK (composite) — shift date
fraudPatternVARCHARDetected pattern or null
pings[]ARRAY{timestamp, lat, lng, speed, accuracy}
Investigation Cases (30 records)
ColumnTypeNotes
caseIdVARCHARPK — CASE-2024-001 format
driverIdVARCHARFK -> Drivers
statusENUMOPEN, IN_REVIEW, ESCALATED, CLOSED_FRAUD, CLOSED_FALSE_POSITIVE
fraudPatternVARCHARPattern classification
evidenceSummaryTEXT2-3 sentence summary
estimatedFraudAmountDECIMALUSD, 200-8000
resolutionTEXTInvestigation outcome or null

LLM Enhancements

Fraud Investigation Summary

Analyzes driver profile, delivery records, GPS trace, and evidence timeline to generate a formal investigation report — pattern classification, top 3 evidence points, confidence %, recommended action, and estimated financial impact.

Represents how I would build this system today with LLM capabilities
Signal Explanation

Provides plain English explanations for each evidence timeline item. Contextualizes GPS anomalies, speed violations, and behavioral signals for non-technical ops managers who need to understand what the data means.

Represents how I would build this system today with LLM capabilities
Natural Language Driver Search

Queries like "show all order dodgers in Seattle with fraud score above 70" are interpreted and translated into structured filters. Returns matching driverIds with an explanation of what was found.

Represents how I would build this system today with LLM capabilities
Daily Intelligence Brief

Auto-loads on the Live Monitoring dashboard. Provides total fraud exposure, dominant pattern, highest-risk driver to prioritize, and one operational recommendation for the current shift.

Represents how I would build this system today with LLM capabilities
Case Narrative Generator

Drafts a formal investigation narrative for HR/legal review covering incident timeline, evidence, policy violations, and recommended action. Editable before saving to the case record.

Represents how I would build this system today with LLM capabilities

Key Metrics

$0.6M

Annual Savings

37

Headcount Avoided

< 90s

Detection Latency

< 8%

False Positive Rate

100%

Fleet Coverage

5

Fraud Patterns