Some of the most interesting projects I worked on at LinkedIn involved building large scale real-time pricing and machine learning products. They required crafting fault-tolerant distributed data architectures to support model training, forecasting and dynamic control systems. By Luthfur Chowdhury.
None of this work involved a traditional relational database. Instead event streams, derived data and stream processing became the core building blocks.
The author worked with teams which built entire multi-billion dollar products on top of these architectures. The experience gave hom a new perspective on how organizations can think about their data infrastructure in terms of data flows.
The article deals with:
- Logs and Event Streams
- Loose Coupling of Systems
- Unbundling the Database
- Data Integrity and Fault Tolerance
- Data Quality, Integrity and Security
Log based data integration promises to improve data availability across the breadth of an organization. It creates the ability to democratize data access and analysis. With stream processing systems we are able to work with continuous flow of data in real-time using composable components. The links to further reading is provided. Nice one!
[Read More]