ebay.com
Low Latency Web-Scale Fraud Prevention
eBay Enterprise is the world’s largest omni-channel commerce provider. The engineering team at eBay chose Apache Samza to build PreCog, their horizontally scalable anomaly detection system.
PreCog extensively leverages Samza’s high-performance, fault-tolerant local storage. Its architecture had the following requirements, for which Samza perfectly fit the bill:
Web-scale: Scale to a large number of users and large volume of data per-user. Additionally, should be possible to add more commodity hardware and scale horizontally.
Low-latency: Process customer interactions real-time by reacting in milliseconds instead of hours.
Fault-tolerance: Gracefully tolerate and handle hardware failures.
The PreCog anomaly-detection system comprises of multiple tiers, with each tier consisting of multiple Samza jobs, which process the output of the previous tier.
Ingestion tier: In this tier, a variety of historical and realtime data from various sources including people, places etc., is ingested into Kafka.
Fanout tier: This tier consists of Samza jobs which process the Kafka events, fan them out and re-partition them based on various facets like email-address, ip-address, credit-card number, shipping address etc.
Compute tier: The Samza jobs in this tier consume messages from the fan-out tier and compute various key metrics and derived features. Features used to evaluate fraud include:
- Number of transactions per-customer per-day
- Change in the number of daily transactions over the past few days
- Amount value ($$) of each transaction per-day
- Change in the amount value of transactions over a sliding time-window
- Number of transactions per shipping-address
Assembly tier: This tier comprises of Samza jobs which join the output of the compute-tier with other additional data-sources and make a final determination on transaction-fraud.
For monitoring the PreCog pipeline, EBay leverages Samza’s JMXMetricsReporter and ingests the reported metrics into OpenTSDB/ HBase. The metrics are then visualzed using Grafana.
Key Samza features: Stateful processing, Windowing, Kafka-integration, JMX-metrics
More information: