bi_etl.components.row.row_iteration_header module

Created on May 26, 2015

@author: Derek Wood

class bi_etl.components.row.row_iteration_header.RowIterationHeader(logical_name: str | None = None, primary_key: Iterable | None = None, parent: ETLComponent | None = None, columns_in_order: Iterable | None = None, owner_pid: int = None)[source]

Bases: object

Stores the headers of a set of rows for a given iteration

__init__(logical_name: str | None = None, primary_key: Iterable | None = None, parent: ETLComponent | None = None, columns_in_order: Iterable | None = None, owner_pid: int = None)[source]
static add_remote_iteration_header(remote_header)[source]
add_row(row)[source]
property column_count: int
property column_set: frozenset

An ImmutableSet of the columns of this row. Used to store different row configurations in a dictionary or set.

WARNING: The resulting set is not ordered. Do not use if the column order affects the operation. See positioned_column_set instead.

property columns_in_order: Sequence

A list of the columns of this row in the order they were defined.

get_action_header(action: tuple, start_empty: bool = False) RowIterationHeader[source]

Get the header after performing a manipulation on the set of columns.

Parameters:
  • action – A hashable action ID

  • start_empty – Should the new header start empty (vs transferring the columns)

static get_by_id(iteration_id)[source]
static get_by_process_and_id(value_sent)[source]
get_column_name(input_name: str | Column) str[source]
get_column_position(column_name: str, allow_create: bool = False) int[source]

Get the ordinal column position based on a column name (str)

Parameters:
  • column_name – String column name

  • allow_create – Is this method allowed to create a new column. Note: if columns_frozen is True this method will return a KeyError even if allow_create is True.

get_cross_process_iteration_header()[source]
get_next_header(action: tuple, start_empty: bool = False) RowIterationHeader[source]

Get the next header after performing a manipulation on the set of columns.

Parameters:
  • action – A hashable action ID

  • start_empty – Should the new header start empty (vs transferring the columns)

has_column(column_name) bool[source]
instance_dict = {}
lock = <unlocked _thread.lock object>
next_iteration_id = 0
property positioned_column_set: Set[tuple]

An ImmutableSet of the tuples (column, position) for this row. Used to store different row configurations in a dictionary or set.

Note: column_set would not always work here because the set is not ordered even though the columns are.

property primary_key: List | None
remove_row(row)[source]
rename_column(old_name: str, new_name: str, ignore_missing: bool = False, no_new_header: bool = False) RowIterationHeader[source]

Rename a column

Parameters:
  • old_name – str The name of the column to find and rename.

  • new_name – str The new name to give the column.

  • ignore_missing – boolean Ignore (don’t raise error) if we don’t have a column with the name in old_name. Defaults to False

  • no_new_header

    Skip creating a new row header, modify in place.

    ** BE CAREFUL USING THIS! **

    All new rows created with this header will immediately get the new name, in which case you won’t want to call this method again.

rename_columns(rename_map: dict | List[tuple], ignore_missing: bool = False, no_new_header: bool = False) RowIterationHeader[source]

Rename many columns at once.

Parameters:
  • rename_map – A dict or list of tuples to use to rename columns. Note a list of tuples is better to use if the renames need to happen in a certain order.

  • ignore_missing – Ignore (don’t raise error) if we don’t have a column with the name in old_name. Defaults to False

  • no_new_header

    Skip creating a new row header, modify in place.

    ** BE CAREFUL USING THIS! **

    All new rows created with this header will immediately get the new name, in which case you won’t want to call this method again.

row_remove_column(column_name: str, row: bi_etl.components.row.row.Row, ignore_missing: bool = False) RowIterationHeader[source]
row_set_item(column_name: str, value, row: bi_etl.components.row.row.Row) RowIterationHeader[source]

Set a column in a row and return a new row header (it might have changed if the column was new).

Parameters:
Returns:

Modified row header

row_subset(row: bi_etl.components.row.row.Row, exclude: Iterable | None = None, rename_map: dict | List[tuple] | None = None, keep_only: Iterable | None = None) bi_etl.components.row.row.Row[source]

Return a new row instance with a subset of the columns. Original row is not modified Excludes are done first, then renames and finally keep_only.

Parameters:
  • row – The row to subset

  • exclude – A list of column names (before renames) to exclude from the subset. Optional. Defaults to no excludes.

  • rename_map – A dict to use to rename columns. Optional. Defaults to no renames.

  • keep_only – A list of column names (after renames) of columns to keep. Optional. Defaults to keep all.

Returns:

  • a list with the position mapping of new to old items.

  • So – The first item in the list will be the index of that item in the old list. The second item in the list will be the index of that item in the old list. etc