bi_etl.lookups.non_unique_lookup module

Created on Feb 27, 2015

@author: Derek Wood

class bi_etl.lookups.non_unique_lookup.NonUniqueLookup(lookup_name: str, lookup_keys: list, parent_component: ETLComponent, config: BI_ETL_Config_Base = None, use_value_cache: bool = True, value_for_none='<None>')[source]

Bases: Lookup

COLLECTION_INDEX = datetime.datetime(1900, 1, 1, 0, 0)
DB_LOOKUP_WARNING = 1000
ROW_TYPES

alias of Union[Row, Sequence]

VERSION_COLLECTION_TYPE

alias of OOBTree

__init__(lookup_name: str, lookup_keys: list, parent_component: ETLComponent, config: BI_ETL_Config_Base = None, use_value_cache: bool = True, value_for_none='<None>')[source]
add_size_to_stats() None
cache_row(row: Row, allow_update: bool = True, allow_insert: bool = True)[source]

Adds the given row to the cache for this lookup.

Parameters:
  • row (Row) – The row to cache

  • allow_update (boolean) – Allow this method to update an existing row in the cache.

  • allow_insert (boolean) – Allow this method to insert a new row into the cache

Raises:

ValueError – If allow_update is False and an already existing row (lookup key) is passed in.

cache_set(lk_tuple: tuple, version_collection: OOBTree[datetime, Row], allow_update: bool = True)

Adds the given set of rows to the cache for this lookup.

Parameters:
  • lk_tuple – The key tuple to store the rows under

  • version_collection – The set of rows to cache

  • allow_update (boolean) – Allow this method to update an existing row in the cache.

Raises:

ValueError – If allow_update is False and an already existing row (lookup key) is passed in.

check_estimate_row_size(force_now=False)
clear_cache() None

Removes cache and resets to un-cached state

commit()

Placeholder for other implementations that might need it

estimated_row_size()
find(row: ROW_TYPES, fallback_to_db: bool = True, maintain_cache: bool = True, stats: Statistics = None, **kwargs) Row
find_in_cache(row: Row | Sequence, **kwargs) Row[source]

Find an existing row in the cache effective on the date provided. Can raise ValueError if the cache is not setup. Can raise NoResultFound if the key is not in the cache. Can raise BeforeAllExisting is the effective date provided is before all existing records.

find_in_remote_table(row: Row | Sequence, **kwargs) Row[source]

Find a matching row in the lookup based on the lookup index (keys)

Only works if parent_component is based on bi_etl.components.readonlytable

find_matches_in_cache(row: Row | Sequence, **kwargs) Sequence[Row][source]

Find an existing row in the cache effective on the date provided. Can raise ValueError if the cache is not setup. Can raise NoResultFound if the key is not in the cache. Can raise BeforeAllExisting is the effective date provided is before all existing records.

find_versions_list(row: ROW_TYPES, fallback_to_db: bool = True, maintain_cache: bool = True, stats: Statistics = None) list
Parameters:
  • row – row or tuple to find

  • fallback_to_db – Use db to search if not found in cached copy

  • maintain_cache – Add DB lookup rows to the cached copy?

  • stats – Statistics to maintain

Return type:

A MutableMapping of rows

find_versions_list_in_remote_table(row: Row | Sequence) Sequence[Row][source]

Find a matching row in the lookup based on the lookup index (keys)

Only works if parent_component is based on bi_etl.components.readonlytable

find_where(key_names: Sequence, key_values_dict: Mapping, limit: int = None)

Scan all cached rows (expensive) to find list of rows that match criteria.

get_disk_size() int
get_hashable_combined_key(row: ROW_TYPES) Sequence
get_list_of_lookup_column_values(row: Row | Sequence) list[source]
get_memory_size() int
get_versions_collection(row: ROW_TYPES) MutableMapping[datetime, Row]

This method exists for compatibility with range caches

Parameters:

row – The row with keys to search row

Return type:

A MutableMapping of rows

has_done_get_estimate_row_size()
has_row(row: ROW_TYPES) bool

Does the row exist in the cache (for any date if it’s a date range cache)

Parameters:

row

init_cache() None

Initializes the cache as empty.

property lookup_keys_set
report_on_value_cache_effectiveness(lookup_name: str = None)
row_iteration_header_has_lookup_keys(row_iteration_header: RowIterationHeader) bool
static rstrip_key_value(val: object) object

Since most, if not all, DBs consider two strings that only differ in trailing blanks to be equal, we need to rstrip any string values so that the lookup does the same.

Parameters:

val

Returns:

uncache_row(row: Row | Sequence)[source]
uncache_set(row: Row | Sequence)[source]
uncache_where(key_names: Sequence, key_values_dict: Mapping)

Scan all cached rows (expensive) to find rows to remove.