SlideShare a Scribd company logo
Vertica Open Source Relations Manager
Architecting Production IoT Analytics
Paige Roberts
Paige Roberts, Vertica Open Source Relations Manager
2
§ 23 years in data management
§ 6 years as a teacher
§ 1 year at Vertica
§ Past: Syncsort, Hortonworks, Bloor Group,
Actian, Epicor, Pervasive, CSC, Data
Junction
§ Can’t seem to decide what I want to be
when I grow up:
§ Tech Support
§ Tech Writer
§ Software Trainer
§ Software Engineer
§ Consultant
§ Industry Analyst
§ Product and Technical Marketer
§ Product Manager
Agenda
§ Introduction
§ Some Successful IoT Architectures
§ Architecture Evolution
§ Take Aways
§ Q & A
Introduction
5
Smart
Buildings
Health / EMR
Analytics
Ride
Share
Customer
Analytics
Network
Optimization
Predictive
Maintenance
Route
Optimization
Wearable
Analytics
Smart
Agriculture
Software
Optimization
Clickstream
Analytics
Security
Analysis
A Day in the IoT Analytics Life
Successful implementations right now across dozens of industries and use cases
6
IoT Architectures
Batch
Low
Latency
Philips Remote
Service
Network
SQL Server
Teradata,
Salesforce,
SAP data
Distributed
Analytics
Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
R & D
Access
Remote
Monitoring
Remote
Service
Anritsu/
Internal
Machine
Network
Customer
Auto-Sync
Pre-Packaged
Dashboards
Batch
Hot Data
Call Detail
Records
Low
Latency
ML Application
HDFS HDFS
ETL
/ Data
Enrichment
Hive Hive
Cold Data
Distributed
Analytics
Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
Contextual
Data
(PostgreSQL)
Real-Time
Bidding Data
Third-Party
Data
Extremely
Low
Latency
Low
Latency
Batch
HDFS HDFS
Distributed
Analytics
Database
Distributed
Analytics
Database
Distributed
Analytics
Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
RDBMS
MySQL,
PostgreSQL …
Cassandra,
Key/Value
DB
Schema
Enforced Recent
Data
Applications:
• ETL/Modeling
• CityOps
• Machine Learning
• Experiments
Batch
Low
Latency
Application data
Clickstreams
Location Data
Ingestion EL
(Extract, Load)
HDFS HDFS
ETL
(Flattened,
Modeled
Tables)
Hive, Spark, Presto, Notebooks
Ad Hoc Analytics:
• CityOps
• Data Scientists
• QueryBuilder –
Uber created
• DashBuilder –
Uber created
Distributed
Analytics
Database
Distributed
Analytics
Database
Distributed
Analytics
Database
DataManager/Manifest
DatabaseProxy
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
Application data
- clickstreams
Amazon
RDS
Separate Ingest/
ETL Cluster
Batch
Low
Latency
Planting and
harvest equipment
Weather stations,
probes, satellite
imagery
Sales Data
Marketing Campaigns
Distributed
Analytics
Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
Data Ingest
/ ELT
Business
Intelligence
CRM
Batch
ERP
Billing
Contact Center
Geo/Mapping
Customer
CX
Operational
Financial
Service Quality
TV Schedule
Ingestion ELT
Transformation
pushed down
Reporting
Financial Reporting
Distributed Analytics Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
Data Ingest
/ ELT
Machine Learning
for real-time ad
targeting
Business
Intelligence
CRM
Batch
Low
Latency
EON TV Service
On/Off, Channel
Change Data
ERP
Billing
Contact Center
Geo/Mapping
Customer
CX
Operational
Financial
Service Quality
TV Schedule
Ingestion ELT
Transformation
pushed down
Content Analytics
Reporting
Financial Reporting
Data-Driven Apps
Customer Profile
Analyses for
Data-Driven Apps
Distributed Analytics Database
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
Data Ingest / ELT/
BI / Machine Learning
BI
Reporting
BI
Reporting
Distributed Analytics Database
that uses shared storage
Contextual
Data
Real-Time
Bidding Data
Third-Party
Data
Extremely
Low
Latency
Low
Latency
Batch
Visualization
Application
AI
Distributed
Pub/Sub
System
Data
Sources
Mass
Storage
ETL
Extract, Transform,
Load
Analytics and
Machine Learning
Key Aspects
§All Highly Successful Production Architectures
§ Simplicity of design
§ Both low-latency streaming and historical batch data processing
§ Bring analytics to data, not data to analytics
17
Data Architecture Evolution
Business
Intelligence
CRM ERP
Billing
Application Data
Customer
Operational
Financial
ETL
Analytics Database
Transactional
Data
Message Queues
Files
Data Warehouse
Batch
Visualization
Data Lake
20
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done on
mass
storage
Data Prep /
Enrichment
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Visualization
Applications
Artificial Intelligence
Cloud
On-Premises
AND / OR
Distributed
Pub/Sub
Distributed
Prepped
Data
Object
Storage
Stream Processing
Data Lake
Mass Storage
Query Engine /
Machine
Learning
Cooperative Data Architecture
21
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done on
mass
storage
Data Lake
Mass Storage
Data Prep /
Enrichment
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Visualization
Applications
Artificial Intelligence
Import
Export
Query
Cloud
On-Premises
AND / OR
Distributed
Pub/Sub
Distributed
Columnar
Data
Object
Storage
Distributed
Analytics
Warehouse
Stream Processing
Unified Analytics Warehouse
22
Visualization
Applications
Artificial Intelligence
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Stream Processing
Ingestion/ ELT/
Data Prep / Enrichment
Shared
Storage
Cloud
On-Premises
AND / OR
Object
Storage
Managed ML
Models
Distributed
Pub/Sub
HDFS HDFS
Unified Analytics
Warehouse
Data Science Tools
Batch ETL
or EL with T
done in
warehouse
Reporting /
BI
Data Science /
ML
Departmental
Use
Contextual
Data
Files
Weather
Geo
TakeAways
The Only Constant is Change
24
The Only Constant is Change – DOFOFU!
§ Don’t commit to only open source, only proprietary, only one brand
(Yes, people HAVE been fired for choosing only IBM)
§ Don’t lock yourself in to only one deployment option – solution only works on-
prem, only works on Cloud, only works on THIS cloud
§ Don’t tightly couple components – Everything should be interchangeable.
Switching out one component shouldn’t break everything.
§ Plan for the future. Don’t get locked in. - DOFOFU
25
(Acronym creation credit to @_ColinFay)
Q&ALearn More: academy.vertica.com
Try it Free: vertica.com/try
Paige Roberts
Open Source Relations Manager
E: Paige.Roberts@microfocus.com
Thank you!
http://academy.vertica.com
https://www.vertica.com/data-disruptors-vertica-webcast-series/
Vertica Data Disruptors Webcast Series
Cooperative Data Architecture
29
Reporting/BI
Ingestion/ELT Data Science/ML
Departmental Use
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done on
mass
storage
Data Lake
Mass Storage
Data Prep /
Enrichment
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Import
Query
Export
Cloud
On-Premises
AND / OR
Stream Processing
Manage
Import
Export
Visualization
Applications
Artificial Intelligence
Unified Analytics Warehouse
30
Contextual
Data
Files
Weather
Geo
Low
Latency
Batch
Transactional
Data
Application Data
OLTP/ODS
Batch ETL
or EL with T
done in
warehouse
Streaming
Data
Application data
Web clicks
Logs
Sensors
Operational metrics
User tracking
Geo-location
Stream Processing
Cloud
Visualization
Applications
Artificial Intelligence
Shared
Storage
On-Premises
HDFS HDFS
AND / OR Ingestion/ ELT/
Data Prep / Enrichment
Managed ML
Models
Reporting /
BI
Data Science /
ML
Departmental
Use
Predictive Maintenance Demo
31
Analyze sensor data
from cooling towers
across the US ,
enabling equipment
manufacturers to
predict and prevent
equipment failure
Flight Tracker Demo
32
Vertica operates
at the “edge”
with flight track
detail. Sensor
data is collected
using a Raspberry
pi with radio
receiver and
antenna. Data is
loaded into
Vertica as
thousands of
records per
second and
builds to billions
of flight data
points collected
within a 250-mile
radius.
https://www.vertica.com/blog/blog-post-series-using-vertica-track-
commercial-aircraft-near-real-time/
33
Need to analyze over 1.5 PB of
customer, product sensor, and
performance metadata to fine-tune
leading-edge product development
§ Embeds Vertica in InfoSight flash
storage software platform and
leverages predictive analytics to
deliver advanced customer insights
and 99.9999% availability
§ Analyzes millions of sensors every
second to prevent problems
Predicting and solving
86% of customer issues
automatically
34
Goal: Faster decisions and analytical insight on manufacturing
plant assets – but …
§ Data is doubling every two years, and analyzing multiple ‘hot’
and ‘cold data streams too time consuming and ineffective in
environment where production quality issues are extremely
costly
Solution: Develop and deploy analytic models into
customer supply chains
§ Collect, organize, store and process data from 50 billion
semiconductors and printed circuit boards / year
§ Create a holistic view across both electronic system and
semiconductor component data for quick identification of the
root cause of a defect
§ Analytics on 2 billion data points in < 1 minute
Analyzing quality and yield
in semi-conductor and
electronics manufacturing
35
§ Philips is moving from reactive to data-
driven, proactive maintenance, utilizing
new sources of sensor data along with
machine learning models to enable
scheduled, predictable and non-intrusive
service actions
§ Collects and processes data from devices
and medical imaging systems using remote
monitoring and predictive analytics
Aiming for zero unplanned
equipment downtime
with IoT analytics
Anritsu provides products and services for the development,
manufacturing and maintenance of a range of communication
systems for mobile phones and internet connectivity. Anritsu is
now leveraging its experience with service assurance technologies
to develop solutions for 5G, M2M, IoT, and other wireline and
wireless communication markets.
Goal
- Deploy new customers faster to increase customer
satisfaction
- Deal with skyrocketing data volumes, while keeping costs
down and allowing the flexibility to scale both
immediately and in the future
- Increase employee productivity
Result
- Return on investment of 351%; payback period of 4
months; average annual benefit: $3,014,583
36
Anritsu Handles Large Scale Data Input with
Faster Response Time
Improved performance on complex analytics

More Related Content

Architecting Production IoT Analytics_Paige Roberts

  • 1. Vertica Open Source Relations Manager Architecting Production IoT Analytics Paige Roberts
  • 2. Paige Roberts, Vertica Open Source Relations Manager 2 § 23 years in data management § 6 years as a teacher § 1 year at Vertica § Past: Syncsort, Hortonworks, Bloor Group, Actian, Epicor, Pervasive, CSC, Data Junction § Can’t seem to decide what I want to be when I grow up: § Tech Support § Tech Writer § Software Trainer § Software Engineer § Consultant § Industry Analyst § Product and Technical Marketer § Product Manager
  • 3. Agenda § Introduction § Some Successful IoT Architectures § Architecture Evolution § Take Aways § Q & A
  • 6. 6
  • 8. Batch Low Latency Philips Remote Service Network SQL Server Teradata, Salesforce, SAP data Distributed Analytics Database Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning R & D Access Remote Monitoring Remote Service
  • 9. Anritsu/ Internal Machine Network Customer Auto-Sync Pre-Packaged Dashboards Batch Hot Data Call Detail Records Low Latency ML Application HDFS HDFS ETL / Data Enrichment Hive Hive Cold Data Distributed Analytics Database Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning
  • 11. RDBMS MySQL, PostgreSQL … Cassandra, Key/Value DB Schema Enforced Recent Data Applications: • ETL/Modeling • CityOps • Machine Learning • Experiments Batch Low Latency Application data Clickstreams Location Data Ingestion EL (Extract, Load) HDFS HDFS ETL (Flattened, Modeled Tables) Hive, Spark, Presto, Notebooks Ad Hoc Analytics: • CityOps • Data Scientists • QueryBuilder – Uber created • DashBuilder – Uber created Distributed Analytics Database Distributed Analytics Database Distributed Analytics Database DataManager/Manifest DatabaseProxy Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning
  • 12. Application data - clickstreams Amazon RDS Separate Ingest/ ETL Cluster Batch Low Latency Planting and harvest equipment Weather stations, probes, satellite imagery Sales Data Marketing Campaigns Distributed Analytics Database Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning
  • 13. Data Ingest / ELT Business Intelligence CRM Batch ERP Billing Contact Center Geo/Mapping Customer CX Operational Financial Service Quality TV Schedule Ingestion ELT Transformation pushed down Reporting Financial Reporting Distributed Analytics Database Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning
  • 14. Data Ingest / ELT Machine Learning for real-time ad targeting Business Intelligence CRM Batch Low Latency EON TV Service On/Off, Channel Change Data ERP Billing Contact Center Geo/Mapping Customer CX Operational Financial Service Quality TV Schedule Ingestion ELT Transformation pushed down Content Analytics Reporting Financial Reporting Data-Driven Apps Customer Profile Analyses for Data-Driven Apps Distributed Analytics Database Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning
  • 15. Data Ingest / ELT/ BI / Machine Learning BI Reporting BI Reporting Distributed Analytics Database that uses shared storage Contextual Data Real-Time Bidding Data Third-Party Data Extremely Low Latency Low Latency Batch Visualization Application AI Distributed Pub/Sub System Data Sources Mass Storage ETL Extract, Transform, Load Analytics and Machine Learning
  • 16. Key Aspects §All Highly Successful Production Architectures § Simplicity of design § Both low-latency streaming and historical batch data processing § Bring analytics to data, not data to analytics 17
  • 18. Business Intelligence CRM ERP Billing Application Data Customer Operational Financial ETL Analytics Database Transactional Data Message Queues Files Data Warehouse Batch Visualization
  • 19. Data Lake 20 Contextual Data Files Weather Geo Low Latency Batch Transactional Data Application Data OLTP/ODS Batch ETL or EL with T done on mass storage Data Prep / Enrichment Streaming Data Application data Web clicks Logs Sensors Operational metrics User tracking Geo-location Visualization Applications Artificial Intelligence Cloud On-Premises AND / OR Distributed Pub/Sub Distributed Prepped Data Object Storage Stream Processing Data Lake Mass Storage Query Engine / Machine Learning
  • 20. Cooperative Data Architecture 21 Contextual Data Files Weather Geo Low Latency Batch Transactional Data Application Data OLTP/ODS Batch ETL or EL with T done on mass storage Data Lake Mass Storage Data Prep / Enrichment Streaming Data Application data Web clicks Logs Sensors Operational metrics User tracking Geo-location Visualization Applications Artificial Intelligence Import Export Query Cloud On-Premises AND / OR Distributed Pub/Sub Distributed Columnar Data Object Storage Distributed Analytics Warehouse Stream Processing
  • 21. Unified Analytics Warehouse 22 Visualization Applications Artificial Intelligence Low Latency Batch Transactional Data Application Data OLTP/ODS Streaming Data Application data Web clicks Logs Sensors Operational metrics User tracking Geo-location Stream Processing Ingestion/ ELT/ Data Prep / Enrichment Shared Storage Cloud On-Premises AND / OR Object Storage Managed ML Models Distributed Pub/Sub HDFS HDFS Unified Analytics Warehouse Data Science Tools Batch ETL or EL with T done in warehouse Reporting / BI Data Science / ML Departmental Use Contextual Data Files Weather Geo
  • 23. The Only Constant is Change 24
  • 24. The Only Constant is Change – DOFOFU! § Don’t commit to only open source, only proprietary, only one brand (Yes, people HAVE been fired for choosing only IBM) § Don’t lock yourself in to only one deployment option – solution only works on- prem, only works on Cloud, only works on THIS cloud § Don’t tightly couple components – Everything should be interchangeable. Switching out one component shouldn’t break everything. § Plan for the future. Don’t get locked in. - DOFOFU 25 (Acronym creation credit to @_ColinFay)
  • 25. Q&ALearn More: academy.vertica.com Try it Free: vertica.com/try Paige Roberts Open Source Relations Manager E: [email protected]
  • 28. Cooperative Data Architecture 29 Reporting/BI Ingestion/ELT Data Science/ML Departmental Use Contextual Data Files Weather Geo Low Latency Batch Transactional Data Application Data OLTP/ODS Batch ETL or EL with T done on mass storage Data Lake Mass Storage Data Prep / Enrichment Streaming Data Application data Web clicks Logs Sensors Operational metrics User tracking Geo-location Import Query Export Cloud On-Premises AND / OR Stream Processing Manage Import Export Visualization Applications Artificial Intelligence
  • 29. Unified Analytics Warehouse 30 Contextual Data Files Weather Geo Low Latency Batch Transactional Data Application Data OLTP/ODS Batch ETL or EL with T done in warehouse Streaming Data Application data Web clicks Logs Sensors Operational metrics User tracking Geo-location Stream Processing Cloud Visualization Applications Artificial Intelligence Shared Storage On-Premises HDFS HDFS AND / OR Ingestion/ ELT/ Data Prep / Enrichment Managed ML Models Reporting / BI Data Science / ML Departmental Use
  • 30. Predictive Maintenance Demo 31 Analyze sensor data from cooling towers across the US , enabling equipment manufacturers to predict and prevent equipment failure
  • 31. Flight Tracker Demo 32 Vertica operates at the “edge” with flight track detail. Sensor data is collected using a Raspberry pi with radio receiver and antenna. Data is loaded into Vertica as thousands of records per second and builds to billions of flight data points collected within a 250-mile radius. https://www.vertica.com/blog/blog-post-series-using-vertica-track- commercial-aircraft-near-real-time/
  • 32. 33 Need to analyze over 1.5 PB of customer, product sensor, and performance metadata to fine-tune leading-edge product development § Embeds Vertica in InfoSight flash storage software platform and leverages predictive analytics to deliver advanced customer insights and 99.9999% availability § Analyzes millions of sensors every second to prevent problems Predicting and solving 86% of customer issues automatically
  • 33. 34 Goal: Faster decisions and analytical insight on manufacturing plant assets – but … § Data is doubling every two years, and analyzing multiple ‘hot’ and ‘cold data streams too time consuming and ineffective in environment where production quality issues are extremely costly Solution: Develop and deploy analytic models into customer supply chains § Collect, organize, store and process data from 50 billion semiconductors and printed circuit boards / year § Create a holistic view across both electronic system and semiconductor component data for quick identification of the root cause of a defect § Analytics on 2 billion data points in < 1 minute Analyzing quality and yield in semi-conductor and electronics manufacturing
  • 34. 35 § Philips is moving from reactive to data- driven, proactive maintenance, utilizing new sources of sensor data along with machine learning models to enable scheduled, predictable and non-intrusive service actions § Collects and processes data from devices and medical imaging systems using remote monitoring and predictive analytics Aiming for zero unplanned equipment downtime with IoT analytics
  • 35. Anritsu provides products and services for the development, manufacturing and maintenance of a range of communication systems for mobile phones and internet connectivity. Anritsu is now leveraging its experience with service assurance technologies to develop solutions for 5G, M2M, IoT, and other wireline and wireless communication markets. Goal - Deploy new customers faster to increase customer satisfaction - Deal with skyrocketing data volumes, while keeping costs down and allowing the flexibility to scale both immediately and in the future - Increase employee productivity Result - Return on investment of 351%; payback period of 4 months; average annual benefit: $3,014,583 36 Anritsu Handles Large Scale Data Input with Faster Response Time Improved performance on complex analytics