What are the steps in building a data warehouse? What cloud technology should you use? How to use Airflow to orchestrate your pipeline? By Tuan Nguyen.
In this project, we will build a data warehouse on Google Cloud Platform that will help answer common business questions as well as powering dashboards. You will experience first hand how to build a DAG to achieve a common data engineering task: extract data from sources, load to a data sink, transform and model the data for business consumption.
The article is split into:
- Why Google Cloud Platform?
- Cost
- Ease of use
- Business objective
- The dataset
- Data modeling
- Architecture
- Set up the infrastructure
- Data pipeline
… and much more. Author will walk through the many steps of designing and deploying a data warehouse in GCP using Airflow as an orchestrator. You will also get source code which you can reference in GitHub repo.Excellent for anybody in data science!
[Read More]