Skip to content

petsereypanha/data-engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Learning Repository

A comprehensive collection of Python scripts and examples focused on data engineering fundamentals, including data importing, manipulation, API interactions, and Python programming concepts.

📋 Table of Contents

🎯 Overview

This repository contains hands-on examples and exercises covering essential data engineering topics in Python. It includes practical implementations of data importing techniques, working with various file formats, API interactions, and intermediate Python programming concepts.

📁 Project Structure

data-engineer/
├── cleaning-data-in-python/                 # Data cleaning techniques
├── data-manipulation-with-pandas/           # Pandas operations
├── importing-data-in-python/                # Data importing fundamentals
├── intermediate-importing-data/             # Advanced data importing
├── intermediate-python/                     # Python programming concepts
├── introduction-api/                        # API fundamentals
├── introduction-to-python/                  # Python basics
├── joining-data-with-pandas/                # Data merging and joining
├── python-data-science-toolbox/             # Data science utilities
├── streamlining-data-ingestion-with-pandas/ # Efficient data ingestion
└── requirements.txt                         # Project dependencies

🛠 Technologies

  • Python 3.x
  • NumPy - Numerical computing
  • Pandas - Data manipulation and analysis
  • Matplotlib - Data visualization
  • PyYAML - YAML file parsing
  • Pillow - Image processing
  • Tweepy - Twitter API integration

📦 Installation

  1. Clone the repository:
git clone https://github.com/petsereypanha/data-engineer.git
cd data-engineer
  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On macOS/Linux
# or
venv\Scripts\activate     # On Windows
  1. Install required packages:
pip install -r requirements.txt

🚀 Usage

Navigate to any module directory and run the Python scripts:

# Example: Run data importing scripts
python importing-data-in-python/importing-data.py

# Example: Run API interaction scripts
python intermediate-importing-data/interacting-with-APIs.py

# Example: Run Python function examples
python intermediate-python/function.py

📚 Topics Covered

Cleaning Data in Python

  • Handling missing data
  • Data type conversions
  • Removing duplicates
  • Data validation techniques

Data Manipulation with Pandas

  • DataFrame operations
  • Filtering and sorting
  • Aggregations and grouping
  • Data transformation techniques

Importing Data in Python

  • Reading flat files (CSV, TXT)
  • Working with Excel files
  • Loading pickle files
  • Importing SAS and Stata files
  • Working with HDF5 files
  • Loading MATLAB files

Intermediate Importing Data

  • Importing data from the Internet
  • API interactions and authentication
  • Working with Twitter API
  • HTTP requests and responses
  • JSON data parsing

Intermediate Python

  • Function definition and usage
  • Default arguments and keyword arguments
  • Docstrings and documentation
  • *args and **kwargs
  • Lambda functions
  • Error handling
  • Python ecosystem tools

Introduction to Python

  • Basic data types
  • Control flow (if/else statements)
  • Loops (for, while)
  • Python fundamentals

Introduction to API

  • Making HTTP requests
  • RESTful API concepts
  • Authentication methods
  • Handling API responses

Joining Data with Pandas

  • Inner joins
  • Outer joins
  • Left and right joins
  • Merging DataFrames
  • Concatenating data

Python Data Science Toolbox

  • Advanced function techniques
  • Iterators and generators
  • List comprehensions
  • Data science utilities

Streamlining Data Ingestion with Pandas

  • Efficient data loading
  • Optimizing data types
  • Chunking large files
  • Performance best practices

🤝 Contributing

Contributions are welcome! Feel free to:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/improvement)
  3. Make your changes
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature/improvement)
  6. Create a Pull Request

📝 License

This project is created for educational purposes.

👤 Author

Panha Setserey


⭐ If you find this repository helpful, please consider giving it a star!

About

Basic Fundamentals Data Engineer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages