Data Engineering Python Tools: bi_etl, config_wrangler
A collection of Python libraries geared towards Business Intelligence (BI) and Data
Engineering (ETL, ELT, etc).
Python based Extract Transform Load (ETL) or Extract Load Transform (ELT) framework
geared
towards Business Intelligence (BI)
dimensional databases in particular.
The goal of the project is to create reusable objects with typical technical
transformations
used in loading dimension tables.
Sources supported:
- Database tables
- Database SQL Queries
- Delimited text files
- Excel files
- W3C web logs
Targets supported:
- Delimited text files
- Excel files
- Database tables
- Works with any database supported by SQL Alchemy
- Simple inserts
- Update else Insert (upsert)
- Versioned Update else Insert (upsert SCD Type 2)
- Sourced based versioned Update else Insert
- ALL of the above support
- Memory/disk caching for
performance
- Bulk loading the results
Config Wrangler (config_wrangler)
Pydantic based configuration
wrangler. Handles reading multiple ini or toml
files with inheritance rules and variable expansions.
This tool grew out of the limitations discovered using ConfigParser with a multiple
large ETL loads using the bi_etl framework.
- Validate the configuration files at startup and not hours into the program
run.
- e.g. ConfigParser.getint() would fail deep into a run when
reading a non-integer value
- Needed a clean way to support configuration items that might be either
environment specific or shared across environments (checked into git).
- Needed to have configuration items that are shared across multiple programs
while also having some that are specific to each program. Also wanted to avoid a
single huge monolith config file – that due to the validation had to all be
valid in order for any single program that used part of it to startup
successfully.