bi_etl.components.row.row_iteration_header module
Created on May 26, 2015
@author: Derek Wood
- class bi_etl.components.row.row_iteration_header.RowIterationHeader(logical_name: str | None = None, primary_key: Iterable | None = None, parent: ETLComponent | None = None, columns_in_order: Iterable | None = None, owner_pid: int = None)[source]
Bases:
object
Stores the headers of a set of rows for a given iteration
- __init__(logical_name: str | None = None, primary_key: Iterable | None = None, parent: ETLComponent | None = None, columns_in_order: Iterable | None = None, owner_pid: int = None)[source]
- property column_set: frozenset
An ImmutableSet of the columns of this row. Used to store different row configurations in a dictionary or set.
WARNING: The resulting set is not ordered. Do not use if the column order affects the operation. See positioned_column_set instead.
- property columns_in_order: Sequence
A list of the columns of this row in the order they were defined.
- get_action_header(action: tuple, start_empty: bool = False) RowIterationHeader [source]
Get the header after performing a manipulation on the set of columns.
- get_column_position(column_name: str, allow_create: bool = False) int [source]
Get the ordinal column position based on a column name (str)
- get_next_header(action: tuple, start_empty: bool = False) RowIterationHeader [source]
Get the next header after performing a manipulation on the set of columns.
- instance_dict = {}
- lock = <unlocked _thread.lock object>
- next_iteration_id = 0
- property positioned_column_set: Set[tuple]
An ImmutableSet of the tuples (column, position) for this row. Used to store different row configurations in a dictionary or set.
Note: column_set would not always work here because the set is not ordered even though the columns are.
- rename_column(old_name: str, new_name: str, ignore_missing: bool = False, no_new_header: bool = False) RowIterationHeader [source]
Rename a column
- Parameters:
old_name¶ – str The name of the column to find and rename.
new_name¶ – str The new name to give the column.
ignore_missing¶ – boolean Ignore (don’t raise error) if we don’t have a column with the name in old_name. Defaults to False
no_new_header¶ –
Skip creating a new row header, modify in place.
** BE CAREFUL USING THIS! **
All new rows created with this header will immediately get the new name, in which case you won’t want to call this method again.
- rename_columns(rename_map: dict | List[tuple], ignore_missing: bool = False, no_new_header: bool = False) RowIterationHeader [source]
Rename many columns at once.
- Parameters:
rename_map¶ – A dict or list of tuples to use to rename columns. Note a list of tuples is better to use if the renames need to happen in a certain order.
ignore_missing¶ – Ignore (don’t raise error) if we don’t have a column with the name in old_name. Defaults to False
no_new_header¶ –
Skip creating a new row header, modify in place.
** BE CAREFUL USING THIS! **
All new rows created with this header will immediately get the new name, in which case you won’t want to call this method again.
- row_remove_column(column_name: str, row: bi_etl.components.row.row.Row, ignore_missing: bool = False) RowIterationHeader [source]
- row_set_item(column_name: str, value, row: bi_etl.components.row.row.Row) RowIterationHeader [source]
Set a column in a row and return a new row header (it might have changed if the column was new).
- Parameters:
column_name¶ – column to set
value¶ – new value
row¶ (bi_etl.components.row.row.Row) – row to find column on
- Returns:
Modified row header
- row_subset(row: bi_etl.components.row.row.Row, exclude: Iterable | None = None, rename_map: dict | List[tuple] | None = None, keep_only: Iterable | None = None) bi_etl.components.row.row.Row [source]
Return a new row instance with a subset of the columns. Original row is not modified Excludes are done first, then renames and finally keep_only.
- Parameters:
row¶ – The row to subset
exclude¶ – A list of column names (before renames) to exclude from the subset. Optional. Defaults to no excludes.
rename_map¶ – A dict to use to rename columns. Optional. Defaults to no renames.
keep_only¶ – A list of column names (after renames) of columns to keep. Optional. Defaults to keep all.
- Returns:
a list with the position mapping of new to old items.
So – The first item in the list will be the index of that item in the old list. The second item in the list will be the index of that item in the old list. etc