What works in production is the only technology criteria that matters. When you look at technology and data engineering choices, even in companies with wildly different Internet of Things use cases, you see something surprising: Successful production Internet of Things architectures show a remarkable number of similarities.
Machine Learning analyses on Internet of Things data has broad applications in a variety of industries from smart buildings to smart farming, from network optimization for telecoms to preventative maintenance on expensive medical machines or factory robots. Among other things, every successful IoT architecture includes a distributed streaming pub/sub backbone.
Join us as we drill into the data architectures in a selection of companies like Philips, Anritsu, and Optimal+. Each company, regardless of industry or use case, has one thing in common: highly successful IoT analytics programs in large scale enterprise production deployments.
Come study IoT architectures that work, discuss what drove the decisions behind them, what decisions make sense across broad streaming machine learning use cases, and why.
Learn to
- Judge large scale IoT technology choices critically and objectively
- Avoid some of the traps that have cost other companies time and money and caused so many implementations to fail
- Choose an architecture that will help ensure AI and ML projects make it into production where they have a real impact
1 of 35
More Related Content
Architecting Production IoT Analytics_Paige Roberts
1. Vertica Open Source Relations Manager
Architecting Production IoT Analytics
Paige Roberts
2. Paige Roberts, Vertica Open Source Relations Manager
2
§ 23 years in data management
§ 6 years as a teacher
§ 1 year at Vertica
§ Past: Syncsort, Hortonworks, Bloor Group,
Actian, Epicor, Pervasive, CSC, Data
Junction
§ Can’t seem to decide what I want to be
when I grow up:
§ Tech Support
§ Tech Writer
§ Software Trainer
§ Software Engineer
§ Consultant
§ Industry Analyst
§ Product and Technical Marketer
§ Product Manager
11. RDBMS
MySQL,
PostgreSQL …
Cassandra,
Key/Value
DB
Schema
Enforced Recent
Data
Applications:
• ETL/Modeling
• CityOps
• Machine Learning
• Experiments
Batch
Low
Latency
Application data
Clickstreams
Location Data
Ingestion EL
(Extract, Load)
HDFS HDFS
ETL
(Flattened,
Modeled
Tables)
Hive, Spark, Presto, Notebooks
Ad Hoc Analytics:
• CityOps
• Data Scientists
• QueryBuilder –
Uber created
• DashBuilder –
Uber created
Distributed
Analytics
Database
Distributed
Analytics
Database
Distributed
Analytics
Database
DataManager/Manifest
DatabaseProxy
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
12. Application data
- clickstreams
Amazon
RDS
Separate Ingest/
ETL Cluster
Batch
Low
Latency
Planting and
harvest equipment
Weather stations,
probes, satellite
imagery
Sales Data
Marketing Campaigns
Distributed
Analytics
Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
13. Data Ingest
/ ELT
Business
Intelligence
CRM
Batch
ERP
Billing
Contact Center
Geo/Mapping
Customer
CX
Operational
Financial
Service Quality
TV Schedule
Ingestion ELT
Transformation
pushed down
Reporting
Financial Reporting
Distributed Analytics Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
14. Data Ingest
/ ELT
Machine Learning
for real-time ad
targeting
Business
Intelligence
CRM
Batch
Low
Latency
EON TV Service
On/Off, Channel
Change Data
ERP
Billing
Contact Center
Geo/Mapping
Customer
CX
Operational
Financial
Service Quality
TV Schedule
Ingestion ELT
Transformation
pushed down
Content Analytics
Reporting
Financial Reporting
Data-Driven Apps
Customer Profile
Analyses for
Data-Driven Apps
Distributed Analytics Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
15. Data Ingest / ELT/
BI / Machine Learning
BI
Reporting
BI
Reporting
Distributed Analytics Database
that uses shared storage
Contextual
Data
Real-Time
Bidding Data
Third-Party
Data
Extremely
Low
Latency
Low
Latency
Batch
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
16. Key Aspects
§All Highly Successful Production Architectures
§ Simplicity of design
§ Both low-latency streaming and historical batch data processing
§ Bring analytics to data, not data to analytics
17
19. Data Lake
20
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done on
mass
storage
Data Prep /
Enrichment
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Visualization
Applications
Artificial Intelligence
Cloud
On-Premises
AND / OR
Distributed
Pub/Sub
Distributed
Prepped
Data
Object
Storage
Stream Processing
Data Lake
Mass Storage
Query Engine /
Machine
Learning
20. Cooperative Data Architecture
21
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done on
mass
storage
Data Lake
Mass Storage
Data Prep /
Enrichment
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Visualization
Applications
Artificial Intelligence
Import
Export
Query
Cloud
On-Premises
AND / OR
Distributed
Pub/Sub
Distributed
Columnar
Data
Object
Storage
Distributed
Analytics
Warehouse
Stream Processing
21. Unified Analytics Warehouse
22
Visualization
Applications
Artificial Intelligence
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Stream Processing
Ingestion/ ELT/
Data Prep / Enrichment
Shared
Storage
Cloud
On-Premises
AND / OR
Object
Storage
Managed ML
Models
Distributed
Pub/Sub
HDFS HDFS
Unified Analytics
Warehouse
Data Science Tools
Batch ETL
or EL with T
done in
warehouse
Reporting /
BI
Data Science /
ML
Departmental
Use
Contextual
Data
Files
Weather
Geo
24. The Only Constant is Change – DOFOFU!
§ Don’t commit to only open source, only proprietary, only one brand
(Yes, people HAVE been fired for choosing only IBM)
§ Don’t lock yourself in to only one deployment option – solution only works on-
prem, only works on Cloud, only works on THIS cloud
§ Don’t tightly couple components – Everything should be interchangeable.
Switching out one component shouldn’t break everything.
§ Plan for the future. Don’t get locked in. - DOFOFU
25
(Acronym creation credit to @_ColinFay)
28. Cooperative Data Architecture
29
Reporting/BI
Ingestion/ELT Data Science/ML
Departmental Use
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done on
mass
storage
Data Lake
Mass Storage
Data Prep /
Enrichment
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Import
Query
Export
Cloud
On-Premises
AND / OR
Stream Processing
Manage
Import
Export
Visualization
Applications
Artificial Intelligence
29. Unified Analytics Warehouse
30
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done in
warehouse
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Stream Processing
Cloud
Visualization
Applications
Artificial Intelligence
Shared
Storage
On-Premises
HDFS HDFS
AND / OR Ingestion/ ELT/
Data Prep / Enrichment
Managed ML
Models
Reporting /
BI
Data Science /
ML
Departmental
Use
30. Predictive Maintenance Demo
31
Analyze sensor data
from cooling towers
across the US ,
enabling equipment
manufacturers to
predict and prevent
equipment failure
31. Flight Tracker Demo
32
Vertica operates
at the “edge”
with flight track
detail. Sensor
data is collected
using a Raspberry
pi with radio
receiver and
antenna. Data is
loaded into
Vertica as
thousands of
records per
second and
builds to billions
of flight data
points collected
within a 250-mile
radius.
https://www.vertica.com/blog/blog-post-series-using-vertica-track-
commercial-aircraft-near-real-time/
32. 33
Need to analyze over 1.5 PB of
customer, product sensor, and
performance metadata to fine-tune
leading-edge product development
§ Embeds Vertica in InfoSight flash
storage software platform and
leverages predictive analytics to
deliver advanced customer insights
and 99.9999% availability
§ Analyzes millions of sensors every
second to prevent problems
Predicting and solving
86% of customer issues
automatically
33. 34
Goal: Faster decisions and analytical insight on manufacturing
plant assets – but …
§ Data is doubling every two years, and analyzing multiple ‘hot’
and ‘cold data streams too time consuming and ineffective in
environment where production quality issues are extremely
costly
Solution: Develop and deploy analytic models into
customer supply chains
§ Collect, organize, store and process data from 50 billion
semiconductors and printed circuit boards / year
§ Create a holistic view across both electronic system and
semiconductor component data for quick identification of the
root cause of a defect
§ Analytics on 2 billion data points in < 1 minute
Analyzing quality and yield
in semi-conductor and
electronics manufacturing
34. 35
§ Philips is moving from reactive to data-
driven, proactive maintenance, utilizing
new sources of sensor data along with
machine learning models to enable
scheduled, predictable and non-intrusive
service actions
§ Collects and processes data from devices
and medical imaging systems using remote
monitoring and predictive analytics
Aiming for zero unplanned
equipment downtime
with IoT analytics
35. Anritsu provides products and services for the development,
manufacturing and maintenance of a range of communication
systems for mobile phones and internet connectivity. Anritsu is
now leveraging its experience with service assurance technologies
to develop solutions for 5G, M2M, IoT, and other wireline and
wireless communication markets.
Goal
- Deploy new customers faster to increase customer
satisfaction
- Deal with skyrocketing data volumes, while keeping costs
down and allowing the flexibility to scale both
immediately and in the future
- Increase employee productivity
Result
- Return on investment of 351%; payback period of 4
months; average annual benefit: $3,014,583
36
Anritsu Handles Large Scale Data Input with
Faster Response Time
Improved performance on complex analytics