bi_etl.conversions module

Created on Nov 17, 2014

@author: Derek Wood

bi_etl.conversions.bytes2human(n: int, format_str: str = '%(value).1f %(symbol)s', symbols: str = 'customary') str[source]

Convert n bytes into a human-readable string based on format_str. symbols can be either “customary”, “customary_ext”, “iec” or “iec_ext”, see: http://goo.gl/kTQMs

>>> bytes2human(0)
'0.0 B'
>>> bytes2human(1)
'1.0 B'
>>> bytes2human(1024)
'1.0 K'
>>> bytes2human(1048576)
'1.0 M'
>>> bytes2human(1099511627776127398123789121)
'909.5 Y'
>>> bytes2human(9856, symbols="customary")
'9.6 K'
>>> bytes2human(9856, symbols="customary_ext")
'9.6 kilo'
>>> bytes2human(9856, symbols="iec")
'9.6 Ki'
>>> bytes2human(9856, symbols="iec_ext")
'9.6 kibi'
>>> bytes2human(10000, "%(value).1f %(symbol)s/sec")
'9.8 K/sec'
>>> # precision can be adjusted by playing with %f operator
>>> bytes2human(10000, format_str="%(value).5f %(symbol)s")
'9.76562 K'
bi_etl.conversions.change_tz(source_datetime: datetime | None, from_tzone, to_tzone)[source]

Change time-zones in dates that have no time-zone info, or incorrect time-zone info

Example from_tzone or to_tzone values: ::

import pytz

pytz.utc pytz.timezone(‘US/Eastern’)

bi_etl.conversions.coalesce(*values)[source]
bi_etl.conversions.default_nines(v: int) int[source]

Same as nvl(v, -9999)

bi_etl.conversions.default_to_invalid(v: str) str[source]

Same as nvl(v, ‘Invalid’)

bi_etl.conversions.default_to_missing(v: str) str[source]

Same as nvl(v, ‘Missing’)

bi_etl.conversions.default_to_question_mark(v: str) str[source]

Same as nvl(v, ‘?’)

bi_etl.conversions.ensure_datetime(dt: datetime | date) datetime[source]

Takes a date or a datetime as input, outputs a datetime

bi_etl.conversions.ensure_datetime_dict(d: dict | MutableMapping, key: str)[source]

Takes a dict containing a date or a datetime as input. Changes the dict entry to be a datetime

bi_etl.conversions.get_date_local(dt: datetime) datetime[source]
bi_etl.conversions.get_date_midnight(dt: datetime) datetime[source]
bi_etl.conversions.human2bytes(s: str) int[source]

Attempts to guess the string format based on default symbols set and return the corresponding bytes as an integer. When unable to recognize the format ValueError is raised.

>>> human2bytes('0 B')
0
>>> human2bytes('1 K')
1024
>>> human2bytes('1 M')
1048576
>>> human2bytes('1 Gi')
1073741824
>>> human2bytes('1 tera')
1099511627776
>>> human2bytes('0.5kilo')
512
>>> human2bytes('0.1  byte')
0
>>> human2bytes('1 k')  # k is an alias for K
1024
>>> human2bytes('12 foo')
Traceback (most recent call last):
    ...
ValueError: can't interpret '12 foo'
bi_etl.conversions.int2base(n, base)[source]
bi_etl.conversions.nullif(v, value_to_null)[source]

Pass value through unchanged unless it is equal to provided value_to_null value. If v ==`value_to_null` value then return NULL (None)

bi_etl.conversions.nvl(value, default)[source]

Pass value through unchanged unless it is NULL (None). If it is NULL (None), then return provided default value.

bi_etl.conversions.replace_tilda(e)[source]

Used for unicode error to replace invalid ascii with ~

Apply this with this code

codecs.register_error('replace_tilda', replace_tilda)
...
bytes_value = str_value.encode('ascii', errors='replace_tilda')

See https://docs.python.org/3/library/codecs.html#codecs.register_error

bi_etl.conversions.round_datetime_ms(source_datetime: datetime | None, digits_to_keep: int)[source]

Round a datetime value microseconds to a given number of significant digits.

bi_etl.conversions.str2bytes_size(str_size: str) str[source]

Parses a string containing a size in bytes including KB, MB, GB, TB codes into an integer with the actual number of bytes (using 1 KB = 1024).

bi_etl.conversions.str2date(s: str, dt_format: str = '%m/%d/%Y')[source]

Parse a date (no time) value stored in a string.

Parameters:
bi_etl.conversions.str2datetime(s: str, dt_format: str | Iterable[str] = ('%m/%d/%Y %H:%M:%S', '%m/%d/%Y'))[source]

Parse a date + time value stored in a string.

Parameters:
bi_etl.conversions.str2decimal(s: str)[source]

String to decimal (AKA numeric)

bi_etl.conversions.str2decimal_end_sign(s: str)[source]

String to decimal (AKA numeric). This version is almost 4 times faster than str2decimal in handling signs at the end of the string.

bi_etl.conversions.str2float(s: str)[source]

String to floating point

bi_etl.conversions.str2float_end_sign(s: str)[source]

String to integer This version is almost 4 times faster than str2float in handling signs at the end of the string.

bi_etl.conversions.str2int(s: str)[source]

String to integer

bi_etl.conversions.str2time(s: str, dt_format: str = '%H:%M:%S')[source]

Parse a time of day value stored in a string.

Parameters:
bi_etl.conversions.strip(s: str)[source]

Python str.strip() except that it handles None values.