Data Engineering Introduction
Many of you keep asking where to start and What topics to learn for data engineering. Here I am going to give some important high-level concepts and topics that are needed to cover minimum to complete Data engineering. The topics are categorized based on tech stack and concepts as below:
1. Data Engineering
What’s Data Engineering
Why Data Engineering
Data Engineers — ML Engineers — Data Scientists
Purpose and Scope
Note: The same is covered here: https://lnkd.in/dx-QjTER
2. Python for Data Engineering
Basic Python with Project
Advanced Python with Project
Techniques to write efficient and optimized code
3. Scripting and Automation
Shell Scripting
CRON
ETL
4. Relational Databases and SQL
RDBMS
Data Modeling
Basic SQL
Advanced SQL
Big Query
5. NoSQL Databases and Map Reduce
Unstructured Data
Advanced ETL
Map-Reduce
Data Warehouses
Data API
6. Data Analysis
Pandas
Numpy
Web Scraping
Data Visualization
7. Data Processing Techniques
Batch Processing: Apache Spark
Stream Processing: Spart Streaming
Build Data Pipelines
Target Databases
Machine learning Algorithms
8. Big Data
Big data basics
HDFS in detail
Hadoop Yarn
Sqoop Hadoop
Hadoop Yarn
Hive
Pig
Hbase
9. WorkFlows
Introduction to Airflow
Airflow hands-on project
10. Infrastructure
Docker
Kubernetes
Business Intelligence
11. Cloud Computing
AWS
Azure
Google Cloud Platform
No comments:
Post a Comment