LoFAT — Last-Mile Fraud Detection Platform
GPS telemetry fraud detection for last-mile delivery — automated flagging of spoofing, ghost deliveries, and coordinated fraud
Problem Statement
Amazon's last-mile delivery fleet operates on an hourly pay model. A subset of drivers exploit this structure by remaining "on shift" while systematically avoiding or fabricating deliveries. Prior to LoFAT, fraud detection relied on manual audits of delivery logs, tip-line reports, and periodic GPS spot-checks — a reactive process that caught less than 15% of fraudulent activity and required a dedicated 37-person investigation team.
LoFAT replaced this manual process with an automated pipeline that ingests GPS telemetry in real time, applies ML anomaly detection models (Isolation Forest + Gradient Boosted Classifier), and surfaces flagged drivers to a streamlined investigation workflow — reducing the investigation team from 37 to 0 dedicated headcount while increasing detection coverage from <15% to 100% of the active fleet.
Fraud Patterns (5 Types)
Roster Avoidance (Order Dodging)
Driver is clocked in with valid GPS movement but systematically positions outside pickup zones to avoid order assignment.
signalactive_hours > 6 AND orders_completed <= 1
signalRepeated movement AWAY from pickup clusters
signal8+ assignment attempts with no pickup
signalPattern repeats across consecutive shifts
GPS Spoofing
Device transmits a fixed fake coordinate with micro-jitter while driver is physically stationary.
signalGPS variance < 50m over 4 hours
signalSpeed spikes 0 -> 45 -> 0 mph in 30s
signalIP geolocation mismatches GPS coordinates
signalDevice accelerometer shows zero movement
Ghost Delivery (Fake Completion)
Driver marks delivery "completed" without traveling to customer address. GPS shows driver never came within 500m.
signalNearest GPS point to address > 500m at completion
signaltime_at_delivery_address = 0 seconds
signalNo door photo or photo metadata location mismatch
signalCustomer complaint rate > 40%
Phantom Route (Teleportation)
GPS shows driver "teleporting" — instant appearance 15+ miles away with no route trace between points.
signalImpossible distance between consecutive pings
signalImplied speed > 150 mph in urban area
signalRoute reconstruction fails
signalOccurs during active delivery window (not a dropout)
Cluster Fraud (Coordinated Spoofing)
3+ drivers show identical GPS coordinates at a non-hub location. Suggests shared spoofing script or organized fraud ring.
signal3+ drivers within 50m for 30+ minutes
signalAll show zero deliveries during overlap
signalSimilar device fingerprints (OS, app version, IP subnet)
signalLocation is not a registered hub or pickup zone
System Architecture
GPS pings arrive at 5-second intervals from driver devices into Amazon Kinesis Data Streams. A Lambda consumer processes each event batch, enriches it with shift context from DynamoDB, and extracts a feature vector over a 15-minute sliding window. The feature vector is scored by a SageMaker endpoint running an Isolation Forest model — anomalies above threshold 0.7 trigger a secondary Gradient Boosted Classifier that classifies the fraud pattern.
Kinesis Data Analytics runs real-time SQL aggregation for cluster fraud detection — identifying 3+ drivers within 50m for 30+ minutes. When fraud is confirmed, the Lambda function writes an alert to DynamoDB and publishes to SNS for push notifications to the React dashboard.
GPS trace archives are stored in S3 for historical analysis and model retraining. The React dashboard connects via API Gateway to query driver state, fetch GPS traces for map visualization, and invoke LLM-powered analysis through Anthropic Claude.
ML Approach
Primary: Isolation Forest (Unsupervised)
Trained on 30 days of historical GPS data from clean drivers only. Operates on 15-minute sliding windows with features including GPS variance, max speed delta, zone proximity, delivery rate, and cluster density. Anomalies scoring above 0.7 trigger secondary classification. Retrained weekly on a rolling 30-day window.
Secondary: Gradient Boosted Classifier
Trained on 1,200 labeled fraud cases across 3 months. Classifies anomalies into 5 fraud patterns with a confidence score per pattern. Auto-flag threshold: 0.8 confidence. Achieved 92.3% precision and 88.7% recall with a false positive rate under 8%.
gps_varianceStd dev of lat/lng within window
max_speed_deltaLargest speed change between pings
zone_proximityMin distance to nearest pickup zone
delivery_rateDeliveries completed / hours active
cluster_densityNumber of drivers within 50m
route_continuity% of consecutive pings with valid road path
Data Model
Drivers (200 records)
| Column | Type | Notes |
|---|---|---|
driverId | VARCHAR | PK — DRV-10001 format |
name | VARCHAR | Driver name |
zone | ENUM | Seattle-North, Chicago-Loop, LA-Westside, etc. |
vehicleType | ENUM | BIKE, CAR, VAN, SCOOTER |
status | ENUM | ACTIVE, FLAGGED, SUSPENDED, UNDER_INVESTIGATION, CLEARED |
fraudScore | INT | 0-100 composite anomaly score |
primaryFraudPattern | VARCHAR | Pattern name or null |
hourlyRate | DECIMAL | 18-25 USD |
totalShifts | INT | Career shifts worked (20-200) |
flaggedShifts | INT | Shifts with fraud flags (0-50) |
customerComplaintRate | DECIMAL | 0.0-0.6 ratio |
onTimeRate | DECIMAL | 0.0-1.0 ratio |
Deliveries (1,000 records)
| Column | Type | Notes |
|---|---|---|
deliveryId | VARCHAR | PK |
driverId | VARCHAR | FK -> Drivers |
zone | ENUM | Delivery zone |
deliveryStatus | ENUM | COMPLETED, ATTEMPTED, FAILED, GHOST_FLAGGED, SPOOFED_FLAGGED |
timeAtDeliveryAddress | INT | Seconds at address (0 = ghost delivery) |
distanceFromAddressAtCompletion | INT | Meters from address at completion event |
fraudFlagType | VARCHAR | Pattern name or null |
fraudConfidence | INT | 0-100 ML confidence score |
customerComplaint | BOOLEAN | Customer reported issue |
GPS Traces (20 flagged drivers)
| Column | Type | Notes |
|---|---|---|
driverId | VARCHAR | PK (composite) |
date | DATE | PK (composite) — shift date |
fraudPattern | VARCHAR | Detected pattern or null |
pings[] | ARRAY | {timestamp, lat, lng, speed, accuracy} |
Investigation Cases (30 records)
| Column | Type | Notes |
|---|---|---|
caseId | VARCHAR | PK — CASE-2024-001 format |
driverId | VARCHAR | FK -> Drivers |
status | ENUM | OPEN, IN_REVIEW, ESCALATED, CLOSED_FRAUD, CLOSED_FALSE_POSITIVE |
fraudPattern | VARCHAR | Pattern classification |
evidenceSummary | TEXT | 2-3 sentence summary |
estimatedFraudAmount | DECIMAL | USD, 200-8000 |
resolution | TEXT | Investigation outcome or null |
LLM Enhancements
Fraud Investigation Summary ✦
Analyzes driver profile, delivery records, GPS trace, and evidence timeline to generate a formal investigation report — pattern classification, top 3 evidence points, confidence %, recommended action, and estimated financial impact.
Represents how I would build this system today with LLM capabilitiesSignal Explanation ✦
Provides plain English explanations for each evidence timeline item. Contextualizes GPS anomalies, speed violations, and behavioral signals for non-technical ops managers who need to understand what the data means.
Represents how I would build this system today with LLM capabilitiesNatural Language Driver Search ✦
Queries like "show all order dodgers in Seattle with fraud score above 70" are interpreted and translated into structured filters. Returns matching driverIds with an explanation of what was found.
Represents how I would build this system today with LLM capabilitiesDaily Intelligence Brief ✦
Auto-loads on the Live Monitoring dashboard. Provides total fraud exposure, dominant pattern, highest-risk driver to prioritize, and one operational recommendation for the current shift.
Represents how I would build this system today with LLM capabilitiesCase Narrative Generator ✦
Drafts a formal investigation narrative for HR/legal review covering incident timeline, evidence, policy violations, and recommended action. Editable before saving to the case record.
Represents how I would build this system today with LLM capabilitiesKey Metrics
$0.6M
Annual Savings
37
Headcount Avoided
< 90s
Detection Latency
< 8%
False Positive Rate
100%
Fleet Coverage
5
Fraud Patterns