Data Engineering Python Tools: bi_etl, config_wrangler

A collection of Python libraries geared towards Business Intelligence (BI) and Data Engineering (ETL, ELT, etc).

BI ETL/ELT Framework (bi_etl)

Python based Extract Transform Load (ETL) or Extract Load Transform (ELT) framework geared towards Business Intelligence (BI) dimensional databases in particular. The goal of the project is to create reusable objects with typical technical transformations used in loading dimension tables.

Sources supported:

  • Database tables
  • Database SQL Queries
  • Delimited text files
  • Excel files
  • W3C web logs

Targets supported:

  • Delimited text files
  • Excel files
  • Database tables
    • Works with any database supported by SQL Alchemy
    • Simple inserts
    • Update else Insert (upsert)
    • Versioned Update else Insert (upsert SCD Type 2)
    • Sourced based versioned Update else Insert
    • ALL of the above support
      • Memory/disk caching for performance
      • Bulk loading the results

bi_etl documentation

PyPI page for bi-etl

git repo for bi_etl

Config Wrangler (config_wrangler)

Pydantic based configuration wrangler. Handles reading multiple ini or toml files with inheritance rules and variable expansions.

This tool grew out of the limitations discovered using ConfigParser with a multiple large ETL loads using the bi_etl framework.

  • Validate the configuration files at startup and not hours into the program run.
    • e.g. ConfigParser.getint() would fail deep into a run when reading a non-integer value
  • Needed a clean way to support configuration items that might be either environment specific or shared across environments (checked into git).
  • Needed to have configuration items that are shared across multiple programs while also having some that are specific to each program. Also wanted to avoid a single huge monolith config file – that due to the validation had to all be valid in order for any single program that used part of it to startup successfully.

config_wrangler documentation

PyPI page for config-wrangler

git repo for config-wrangler