Tuesday, September 27, 2022

DE Series - 1


Data Engineering Introduction

 Many of you keep asking where to start and What topics to learn for data engineering. Here I am going to give some important high-level concepts and topics that are needed to cover minimum to complete Data engineering. The topics are categorized based on tech stack and concepts as below:


1. Data Engineering

What’s Data Engineering
Why Data Engineering
Data Engineers — ML Engineers — Data Scientists
Purpose and Scope

Note: The same is covered here: https://lnkd.in/dx-QjTER

2. Python for Data Engineering

Basic Python with Project
Advanced Python with Project
Techniques to write efficient and optimized code

3. Scripting and Automation

Shell Scripting
CRON
ETL

4. Relational Databases and SQL


RDBMS
Data Modeling
Basic SQL
Advanced SQL
Big Query

5. NoSQL Databases and Map Reduce

Unstructured Data
Advanced ETL
Map-Reduce
Data Warehouses
Data API

6. Data Analysis

Pandas
Numpy
Web Scraping
Data Visualization

7. Data Processing Techniques


Batch Processing: Apache Spark
Stream Processing: Spart Streaming
Build Data Pipelines
Target Databases
Machine learning Algorithms

8. Big Data

Big data basics
HDFS in detail
Hadoop Yarn
Sqoop Hadoop
Hadoop Yarn
Hive
Pig
Hbase

9. WorkFlows

Introduction to Airflow
Airflow hands-on project

10. Infrastructure


Docker
Kubernetes
Business Intelligence

11. Cloud Computing

AWS
Azure
Google Cloud Platform

No comments:

Post a Comment

Spark- Window Function

  Window functions in Spark ================================================ -> Spark Window functions operate on a group of rows like pa...