Misbahuddin Mohammed — Engineering Portfolio

LoFAT — Last-Mile Fraud Detection Platform

GPS telemetry fraud detection for last-mile delivery — automated flagging of spoofing, ghost deliveries, and coordinated fraud

Python

AWS Lambda

Kinesis

SageMaker

DynamoDB

React Leaflet

SNS

CloudWatch

Problem Statement

Amazon's last-mile delivery fleet operates on an hourly pay model. A subset of drivers exploit this structure by remaining "on shift" while systematically avoiding or fabricating deliveries. Prior to LoFAT, fraud detection relied on manual audits of delivery logs, tip-line reports, and periodic GPS spot-checks — a reactive process that caught less than 15% of fraudulent activity and required a dedicated 37-person investigation team.

LoFAT replaced this manual process with an automated pipeline that ingests GPS telemetry in real time, applies ML anomaly detection models (Isolation Forest + Gradient Boosted Classifier), and surfaces flagged drivers to a streamlined investigation workflow — reducing the investigation team from 37 to 0 dedicated headcount while increasing detection coverage from <15% to 100% of the active fleet.

Fraud Patterns (5 Types)

Roster Avoidance (Order Dodging)

HIGH

Driver is clocked in with valid GPS movement but systematically positions outside pickup zones to avoid order assignment.

signalactive_hours > 6 AND orders_completed <= 1

signalRepeated movement AWAY from pickup clusters

signal8+ assignment attempts with no pickup

signalPattern repeats across consecutive shifts

GPS Spoofing

CRITICAL

Device transmits a fixed fake coordinate with micro-jitter while driver is physically stationary.

signalGPS variance < 50m over 4 hours

signalSpeed spikes 0 -> 45 -> 0 mph in 30s

signalIP geolocation mismatches GPS coordinates

signalDevice accelerometer shows zero movement

Ghost Delivery (Fake Completion)

CRITICAL

Driver marks delivery "completed" without traveling to customer address. GPS shows driver never came within 500m.

signalNearest GPS point to address > 500m at completion

signaltime_at_delivery_address = 0 seconds

signalNo door photo or photo metadata location mismatch

signalCustomer complaint rate > 40%

Phantom Route (Teleportation)

HIGH

GPS shows driver "teleporting" — instant appearance 15+ miles away with no route trace between points.

signalImpossible distance between consecutive pings

signalImplied speed > 150 mph in urban area

signalRoute reconstruction fails

signalOccurs during active delivery window (not a dropout)

Cluster Fraud (Coordinated Spoofing)

CRITICAL

3+ drivers show identical GPS coordinates at a non-hub location. Suggests shared spoofing script or organized fraud ring.

signal3+ drivers within 50m for 30+ minutes

signalAll show zero deliveries during overlap

signalSimilar device fingerprints (OS, app version, IP subnet)

signalLocation is not a registered hub or pickup zone

System Architecture

GPS pings arrive at 5-second intervals from driver devices into Amazon Kinesis Data Streams. A Lambda consumer processes each event batch, enriches it with shift context from DynamoDB, and extracts a feature vector over a 15-minute sliding window. The feature vector is scored by a SageMaker endpoint running an Isolation Forest model — anomalies above threshold 0.7 trigger a secondary Gradient Boosted Classifier that classifies the fraud pattern.

Kinesis Data Analytics runs real-time SQL aggregation for cluster fraud detection — identifying 3+ drivers within 50m for 30+ minutes. When fraud is confirmed, the Lambda function writes an alert to DynamoDB and publishes to SNS for push notifications to the React dashboard.

GPS trace archives are stored in S3 for historical analysis and model retraining. The React dashboard connects via API Gateway to query driver state, fetch GPS traces for map visualization, and invoke LLM-powered analysis through Anthropic Claude.

Animated edges show the primary data path. Dashed edges show async/secondary flows. Drag to explore, scroll to zoom.

ML Approach

Primary: Isolation Forest (Unsupervised)

Trained on 30 days of historical GPS data from clean drivers only. Operates on 15-minute sliding windows with features including GPS variance, max speed delta, zone proximity, delivery rate, and cluster density. Anomalies scoring above 0.7 trigger secondary classification. Retrained weekly on a rolling 30-day window.

Secondary: Gradient Boosted Classifier

Trained on 1,200 labeled fraud cases across 3 months. Classifies anomalies into 5 fraud patterns with a confidence score per pattern. Auto-flag threshold: 0.8 confidence. Achieved 92.3% precision and 88.7% recall with a false positive rate under 8%.

gps_variance

Std dev of lat/lng within window

max_speed_delta

Largest speed change between pings

zone_proximity

Min distance to nearest pickup zone

delivery_rate

Deliveries completed / hours active

cluster_density

Number of drivers within 50m

route_continuity

% of consecutive pings with valid road path

Data Model

Drivers (200 records)

Column	Type	Notes
`driverId`	VARCHAR	PK — DRV-10001 format
`name`	VARCHAR	Driver name
`zone`	ENUM	Seattle-North, Chicago-Loop, LA-Westside, etc.
`vehicleType`	ENUM	BIKE, CAR, VAN, SCOOTER
`status`	ENUM	ACTIVE, FLAGGED, SUSPENDED, UNDER_INVESTIGATION, CLEARED
`fraudScore`	INT	0-100 composite anomaly score
`primaryFraudPattern`	VARCHAR	Pattern name or null
`hourlyRate`	DECIMAL	18-25 USD
`totalShifts`	INT	Career shifts worked (20-200)
`flaggedShifts`	INT	Shifts with fraud flags (0-50)
`customerComplaintRate`	DECIMAL	0.0-0.6 ratio
`onTimeRate`	DECIMAL	0.0-1.0 ratio

Deliveries (1,000 records)

Column	Type	Notes
`deliveryId`	VARCHAR	PK
`driverId`	VARCHAR	FK -> Drivers
`zone`	ENUM	Delivery zone
`deliveryStatus`	ENUM	COMPLETED, ATTEMPTED, FAILED, GHOST_FLAGGED, SPOOFED_FLAGGED
`timeAtDeliveryAddress`	INT	Seconds at address (0 = ghost delivery)
`distanceFromAddressAtCompletion`	INT	Meters from address at completion event
`fraudFlagType`	VARCHAR	Pattern name or null
`fraudConfidence`	INT	0-100 ML confidence score
`customerComplaint`	BOOLEAN	Customer reported issue

GPS Traces (20 flagged drivers)

Column	Type	Notes
`driverId`	VARCHAR	PK (composite)
`date`	DATE	PK (composite) — shift date
`fraudPattern`	VARCHAR	Detected pattern or null
`pings[]`	ARRAY	{timestamp, lat, lng, speed, accuracy}

Investigation Cases (30 records)

Column	Type	Notes
`caseId`	VARCHAR	PK — CASE-2024-001 format
`driverId`	VARCHAR	FK -> Drivers
`status`	ENUM	OPEN, IN_REVIEW, ESCALATED, CLOSED_FRAUD, CLOSED_FALSE_POSITIVE
`fraudPattern`	VARCHAR	Pattern classification
`evidenceSummary`	TEXT	2-3 sentence summary
`estimatedFraudAmount`	DECIMAL	USD, 200-8000
`resolution`	TEXT	Investigation outcome or null

LLM Enhancements

Fraud Investigation Summary ✦

Analyzes driver profile, delivery records, GPS trace, and evidence timeline to generate a formal investigation report — pattern classification, top 3 evidence points, confidence %, recommended action, and estimated financial impact.