Building data pipeline 101

Click for: original source

Haridas N is author of this article about data pipelines and how machine learning (ML) helps businesses manage, analyze, and use data more effectively than ever before.

ML identifies patterns in data through supervised and unsupervised learning, using algorithms to get actionable insights. Recommendations in travel, shopping, and entertainment websites are typical examples, which use consumer data to make personalized offerings.

The article main parts are:

  • Data pipeline – an overview
  • Data collection and cleansing
  • Storage Layer
  • Feature extraction for different models
  • Model design and Training
  • Serve trained Model

Data Pipelines can be broadly classified into two classes:

Batch processing processes scheduled jobs periodically to generate dashboard or other specific insights

Stream processing processes / handles events in real-time as they arrive and immediately detect conditions within a short time, like tracking anomaly or fraud.

A solid data pipeline holds the promise of transforming the dark-data hidden in silos. Having a flexible, efficient and economical pipeline with minimal maintenance and cost footprint allows you to build innovative solutions.

[Read More]

Tags machine-learning big-data data-science miscellaneous