bi_etl.lookups.autodisk_range_lookup module
Created on Jan 5, 2016
@author: Derek Wood
- class bi_etl.lookups.autodisk_range_lookup.AutoDiskRangeLookup(lookup_name: str, lookup_keys: list, parent_component: ETLComponent, begin_date, end_date, config: BI_ETL_Config_Base = None, use_value_cache: bool = True, path=None)[source]
Bases:
AutoDiskLookup
,RangeLookup
Automatic memory / disk lookup cache.
This version divides the cache into N chunks (default is 10). If RAM usage gets beyond limits, it starts moving chunks to disk. Once a chunk is on disk, it stays there.
TODO: For use cases where the lookup will be used in a mostly sequential fashion, it would be useful to have a version that uses ranges instead of a hash function. Then when find_in_cache is called on a disk segment, we could swap a different segment out and bring that segment in. That’s a lot more complicated. We’d also want to maintain a last used date for each segment so that if we add rows to the cache, we can choose the best segment to swap to disk.
Also worth considering is that if we bring a segment in from disk, it would best to keep the disk version. At that point any additions to that segment would need to go to both places.
- COLLECTION_INDEX = datetime.datetime(1900, 1, 1, 0, 0)
- DB_LOOKUP_WARNING = 1000
- VERSION_COLLECTION_TYPE
alias of
OOBTree
- __init__(lookup_name: str, lookup_keys: list, parent_component: ETLComponent, begin_date, end_date, config: BI_ETL_Config_Base = None, use_value_cache: bool = True, path=None)[source]
Optional parameter path controls where the data is persisted
- cache_row(row: Row, allow_update: bool = True, allow_insert: bool = True)[source]
Adds the given row to the cache for this lookup.
- Parameters:
- Raises:
ValueError – If allow_update is False and an already existing row (lookup key) is passed in.
- cache_set(lk_tuple: tuple, version_collection: OOBTree[datetime, Row], allow_update: bool = True)
Adds the given set of rows to the cache for this lookup.
- Parameters:
- Raises:
ValueError – If allow_update is False and an already existing row (lookup key) is passed in.
- check_estimate_row_size(force_now=False)
- clear_cache()
Removes cache and resets to un-cached state
- commit()
Placeholder for other implementations that might need it
- estimated_row_size()
- find(row: ROW_TYPES, fallback_to_db: bool = True, maintain_cache: bool = True, stats: Statistics = None, **kwargs) Row
- find_in_cache(row, **kwargs)[source]
Find a matching row in the lookup based on the lookup index (keys)
- find_in_remote_table(row: Row | Sequence, **kwargs) Row
Find a matching row in the lookup based on the lookup index (keys)
Only works if parent_component is based on bi_etl.components.readonlytable
- find_versions_list(row: ROW_TYPES, fallback_to_db: bool = True, maintain_cache: bool = True, stats: Statistics = None) list
- find_versions_list_in_remote_table(row: Row | Sequence) list
Find a matching row in the lookup based on the lookup index (keys)
Only works if parent_component is based on bi_etl.components.readonlytable
- find_where(key_names: Sequence, key_values_dict: Mapping, limit: int = None)
Scan all cached rows (expensive) to find list of rows that match criteria.
- flush_to_disk()
- get_disk_size()
- get_hashable_combined_key(row: ROW_TYPES) Sequence
- get_memory_size()
- get_versions_collection(row: Row | Sequence) MutableMapping[datetime, Row]
This method exists for compatibility with range caches
- Parameters:
row¶ – The row with keys to search row
- Return type:
A MutableMapping of rows
- has_done_get_estimate_row_size()
- has_row(row: ROW_TYPES) bool
Does the row exist in the cache (for any date if it’s a date range cache)
- Parameters:
row¶ –
- init_cache()
Initializes the cache as empty.
- init_disk_cache()
- property lookup_keys_set
- row_iteration_header_has_lookup_keys(row_iteration_header: RowIterationHeader) bool