bi_etl.boto3_helper.s3 module

Created on March 28, 2022

@author:

pydantic model bi_etl.boto3_helper.s3.Boto3_S3[source]

Bases: S3_Bucket

Config:

validate_default: bool = True
validate_assignment: bool = True
validate_credentials: bool = True

Fields:

bucket_name ()
keepass ()
keepass_config ()
keepass_group ()
keepass_title ()
key ()
keyring_section ()
password_source ()
raw_password ()
region_name ()
user_id ()
validate_password_on_load ()

Validators:

check_model » all fields

field bucket_name: str [Required]

Validated by:

check_model

field keepass: KeepassConfig | None = None

If the password_source is KEEPASS, then load a sub-section with the config_wrangler.config_templates.keepass_config.KeepassConfig) settings

Validated by:

check_model

field keepass_config: str | None = 'keepass'

If the password_source is KEEPASS, then which root level config item contains the settings for Keepass (must be an instance of config_wrangler.config_templates.keepass_config.KeepassConfig)

Validated by:

check_model

field keepass_group: str | None = None

If the password_source is KEEPASS, which group in the Keepass database should be searched for an entry with a matching entry.

If is None, then the KeepassConfig.default_group value will be checked. If that is also None, then a ValueError will be raised.

Validated by:

check_model

field keepass_title: str | None = None

If the password_source is KEEPASS, this is an optional filter on the title of the keepass entries in the group.

Validated by:

check_model

field key: str | None = None

Validated by:

check_model

field keyring_section: str | None = None

If the password_source is KEYRING, then which section (AKA system) should this module look for the password in.

See https://pypi.org/project/keyring/ or https://github.com/jaraco/keyring

Validated by:

check_model

field password_source: PasswordSourceValidated | None = None

The source to use when getting a password for the user. See PasswordSource for valid values.

Validated by:

check_model

field raw_password: str | None = None

This is only used for the extremely non-secure CONFIG_FILE password source. The password is stored directly in the config file next to the user_id with the setting name raw_password

Validated by:

check_model

field region_name: str | None = None

Validated by:

check_model

field user_id: str | None = None

The user ID to use

Validated by:

check_model

field validate_password_on_load: bool = True

Should config_wrangler query the password source for this password at load (startup) time? If so, it will raise an error if the password is None or an empty string. It does not actually connect or authenticate the user_id & password combination.

Validated by:

check_model

class CompareResult(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

DIFFERENT_SIZE = 1

LOCAL_NEWER = 2

LOCAL_OLDER = 4

SAME_TIMES = 3

class OverwriteModes(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

ALWAYS_OVERWRITE = 1

NEVER_OVERWRITE = 3

OVERWRITE_OLDER = 2

__init__(**data: Any) → None

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Uses something other than self the first arg to allow “self” as a settable attribute

add_child(name: str, child_object: ConfigHierarchy): Set this configuration as a child in the hierarchy of another config. For any programmatically created config objects this is required so that the new object ‘knows’ where it lives in the hierarchy – most importantly so that it can find the hierarchies root object.

check_config_hierarchy(**kwargs)

validator check_model » all fields

classmethod construct(_fields_set: set[str] | None = None, **values: Any) → Model

content_type() → str

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`py data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include¶ – Optional set or mapping specifying which fields to include in the copied model.
exclude¶ – Optional set or mapping specifying which fields to exclude in the copied model.
update¶ – Optional dictionary of field-value pairs to override field values in the copied model.
deep¶ – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

copy_to(target: str | S3_Bucket_Key)

delete(key: str | PurePosixPath | None = None, version_id: str = None)

delete_by_key(key: str | PurePosixPath, version_id: str = None)

dict(*, include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) → Dict[str, Any]

download_file(*, local_filename: str | Path, key: str | PurePosixPath | None = None, extra_args: dict | None = None, transfer_config: TransferConfig | None = None, create_parents: bool = True, overwrite_mode: OverwriteModes = OverwriteModes.OVERWRITE_OLDER, _bucket_object_summary: ObjectSummary = None)

download_file_from_s3(file_key: str, local_file_path: str, date_placeholder: str = None, date_pattern: str = '%Y-%m-%d') → str[source]

static download_file_list_from_s3(file_list: List[File_Entry])[source]

download_files(*, local_path: str | Path, key: str | PurePosixPath | None = None, extra_args: dict | None = None, transfer_config: TransferConfig | None = None, create_parents: bool = True, overwrite_mode: OverwriteModes = OverwriteModes.OVERWRITE_OLDER) → Iterable[Path]

download_files_from_s3(key_prefixes: List[str], local_folder: str, changed_only=True, return_full_list=True) → List[File_Entry][source]

exists(key: str | PurePosixPath | None = None) → bool

static filter_list_changed_vs_local(file_list: List[File_Entry]) → List[File_Entry][source]

find_objects(key: str | PurePosixPath = None) → BucketObjectsCollection

classmethod from_orm(obj: Any) → Model

full_item_name(item_name: str = None, delimiter: str = ' -> '): The fully qualified name of this config item in the config hierarchy.

get(section, item, fallback=Ellipsis): Used as a drop in replacement for ConfigParser.get() with dynamic config field names (using a string variable for the section and item names instead of python code attribute access)

Warning

With this method Python code checkers (linters) will not warn about invalid config items. You can end up with runtime AttributeError errors.

get_boto3_bucket() → Bucket

get_bucket_region() → str: Get the region_name from the actual S3 bucket definition.

Note

This can differ from the region_name attribute specified in the init call to this class or the config file that loads it. The region_name attribute is used for establishing the AWS session. get_bucket_region() is used to find out in which region the data is stored.

get_copy(copied_by: str = 'get_copy') → AWS_Session: Copy this configuration. Useful when you need to programmatically modify a configuration without modifying the original base configuration.

get_list(section, item, fallback=Ellipsis) → list: Used as a drop in replacement for ConfigParser.get() + list parsing with dynamic config field names (using a string variable for the section and item names instead of python code attribute access) that is then parsed as a list.

Warning

With this method Python code checkers (linters) will not warn about invalid config items. You can end up with runtime AttributeError errors.

get_object(key: str | PurePosixPath | None = None) → Object

get_object_uncached(key: str | PurePosixPath | None = None) → Object

get_password() → str: Get the password for this resource. password_source controls where it looks for the password. If that is None, then the root level passwords container is checked for password_source value.

get_secrets_manager()

get_service_client(service: str)

get_service_resource(service: str)

get_ssm()

getboolean(section, item, fallback=Ellipsis) → bool: Used as a drop in replacement for ConfigParser.getboolean() with dynamic config field names (using a string variable for the section and item names instead of python code attribute access)

Warning

With this method Python code checkers (linters) will not warn about invalid config items. You can end up with runtime AttributeError errors.

is_file()

is_relative_to(other: S3_Bucket)

iterdir() → Iterable[S3_Bucket_Key]: Return the S3_Bucket_Key objects contained in the in/under this object.

joinpath(*others) → S3_Bucket_Key | S3_Bucket_Folder

json(*, include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = PydanticUndefined, models_as_dict: bool = PydanticUndefined, **dumps_kwargs: Any) → str

key_exists(key: str | PurePosixPath) → bool

list_object_keys(key: str | PurePosixPath | None = None) → List[str]

list_object_paths(key: str | PurePosixPath | None = None) → List[PurePosixPath]: Return the relative paths of objects contained in the in/under this object or, if provided, under the object + provided key parameter. :param _sphinx_paramlinks_bi_etl.boto3_helper.s3.Boto3_S3.list_object_paths.key:

The

list_objects(key: str | PurePosixPath | None = None) → BucketObjectsCollection

load_into_table_from_s3(database: DatabaseMetadata, table_name: str = None, filename: str = None, delimiter: str = None, region: str = None)[source]

classmethod model_construct(_fields_set: set[str] | None = None, **values: Any) → Model

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

Parameters:

_fields_set¶ – The set of field names accepted for the Model instance.
values¶ – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy(*, update: dict[str, Any] | None = None, deep: bool = False) → Model

Usage docs: https://docs.pydantic.dev/2.6/concepts/serialization/#model_copy

Returns a copy of the model.

Parameters:

update¶ – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep¶ – Set to True to make a deep copy of the model.

Returns:

New model instance.

model_dump(*, mode: Literal['json', 'python'] | str = 'python', include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, round_trip: bool = False, warnings: bool = True) → dict[str, Any]

Usage docs: https://docs.pydantic.dev/2.6/concepts/serialization/#modelmodel_dump

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode¶ – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include¶ – A list of fields to include in the output.
exclude¶ – A list of fields to exclude from the output.
by_alias¶ – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset¶ – Whether to exclude fields that have not been explicitly set.
exclude_defaults¶ – Whether to exclude fields that are set to their default value.
exclude_none¶ – Whether to exclude fields that have a value of None.
round_trip¶ – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings¶ – Whether to log warnings when invalid fields are encountered.

Returns:

A dictionary representation of the model.

model_dump_json(*, indent: int | None = None, include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, round_trip: bool = False, warnings: bool = True) → str

Usage docs: https://docs.pydantic.dev/2.6/concepts/serialization/#modelmodel_dump_json

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent¶ – Indentation to use in the JSON output. If None is passed, the output will be compact.
include¶ – Field(s) to include in the JSON output.
exclude¶ – Field(s) to exclude from the JSON output.
by_alias¶ – Whether to serialize using field aliases.
exclude_unset¶ – Whether to exclude fields that have not been explicitly set.
exclude_defaults¶ – Whether to exclude fields that are set to their default value.
exclude_none¶ – Whether to exclude fields that have a value of None.
round_trip¶ – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings¶ – Whether to log warnings when invalid fields are encountered.

Returns:

A JSON string representation of the model.

model_dump_non_private(*, mode: Literal['json', 'python'] | str = 'python', exclude: Set[str] = None) → dict[str, Any]

classmethod model_json_schema(by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation') → dict[str, Any]

Generates a JSON schema for a model class.

Parameters:

by_alias¶ – Whether to use attribute aliases or not.
ref_template¶ – The reference template.
schema_generator¶ – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode¶ – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name(params: tuple[type[Any], ...]) → str

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params¶ – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(_ModelMetaclass__context: Any) → None: We need to both initialize private attributes and call the user-defined model_post_init method.

classmethod model_rebuild(*, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: dict[str, Any] | None = None) → bool | None

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force¶ – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors¶ – Whether to raise errors, defaults to True.
_parent_namespace_depth¶ – The depth level of the parent namespace, defaults to 2.
_types_namespace¶ – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

classmethod model_validate(obj: Any, *, strict: bool | None = None, from_attributes: bool | None = None, context: dict[str, Any] | None = None) → Model

Validate a pydantic model instance.

Parameters:

obj¶ – The object to validate.
strict¶ – Whether to enforce types strictly.
from_attributes¶ – Whether to extract data from object attributes.
context¶ – Additional context to pass to the validator.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

classmethod model_validate_json(json_data: str | bytes | bytearray, *, strict: bool | None = None, context: dict[str, Any] | None = None) → Model

Usage docs: https://docs.pydantic.dev/2.6/concepts/json/#json-parsing

Validate the given JSON data against the Pydantic model.

Parameters:

json_data¶ – The JSON data to validate.
strict¶ – Whether to enforce types strictly.
context¶ – Extra variables to pass to the validator.

Returns:

The validated Pydantic model.

Raises:

ValueError – If json_data is not a JSON string.

classmethod model_validate_strings(obj: Any, *, strict: bool | None = None, context: dict[str, Any] | None = None) → Model

Validate the given object contains string data against the Pydantic model.

Parameters:

obj¶ – The object contains string data to validate.
strict¶ – Whether to enforce types strictly.
context¶ – Extra variables to pass to the validator.

Returns:

The validated Pydantic model.

nav_to_bucket(bucket_name) → S3_Bucket

nav_to_s3_link(s3_uri: str) → S3_Bucket_Key

open(mode: str = 'r', encoding: str | None = None, errors: str | None = None) → IOBase

classmethod parse_file(path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False) → Model

classmethod parse_obj(obj: Any) → Model

classmethod parse_raw(b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False) → Model

read_bytes() → bytes

read_text(encoding='utf-8', errors=None) → str

rename(target: str)

scan_files_from_s3(key_prefixes: List[str], local_folder: str | None = None) → List[File_Entry][source]

classmethod schema(by_alias: bool = True, ref_template: str = '#/$defs/{model}') → Dict[str, Any]

classmethod schema_json(*, by_alias: bool = True, ref_template: str = '#/$defs/{model}', **dumps_kwargs: Any) → str

set_as_child(name: str, other_config_item: ConfigHierarchy)

set_session(session: Session)

static split_s3_uri(s3_uri: str) → Tuple[str, str]

sts_assume_role(role_arn: str, role_session_name: str, policy_arns: Sequence[PolicyDescriptorTypeTypeDef] = Ellipsis, policy: str = Ellipsis, duration_seconds: int = Ellipsis, tags: Sequence[TagTypeDef] = Ellipsis, transitive_tag_keys: Sequence[str] = Ellipsis, external_id: str = Ellipsis, serial_number: str = Ellipsis, token_code: str = Ellipsis, source_identity: str = Ellipsis, provided_contexts: Sequence[TagTypeDef] = Ellipsis) → Session

static translate_config_data(config_data: MutableMapping): Children classes can provide translation logic to allow older config files to be used with newer config class definitions.

unlink(missing_ok=False)

unload_data(database: DatabaseMetadata, file_format: FileFormat = None, query: str = None, out_folder: str = None, filename: str = None, delimiter: str = None)[source]

classmethod update_forward_refs(**localns: Any) → None

upload_file(*, local_filename: str | Path, key: str | PurePosixPath | None = None, extra_args: dict | None = None, transfer_config: TransferConfig | None = None, overwrite_mode: OverwriteModes = OverwriteModes.ALWAYS_OVERWRITE)

upload_file_to_s3(file_key: str, local_file_path: str)[source]

classmethod validate(value: Any) → Model

with_name(name: str)

with_stem(stem: str)

with_suffix(suffix: str)

property client: S3Client

property has_session: bool

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

property model_extra: dict[str, Any] | None

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

property model_fields_set: set[str]

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property name

property parent

property parents

property parts

property resource: S3ServiceResource

property session: Session

property stem

property suffix

property suffixes

class bi_etl.boto3_helper.s3.FileFormat(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

csv = 'C'

parquet = 'P'

class bi_etl.boto3_helper.s3.File_Entry(content_path: str, full_local_path: str, source_path: str = None, s3_last_modified: datetime = None, s3_file_size: int = None, bucket_object=None, local_last_modified: datetime = None, local_file_size: int = None)[source]

Bases: object

__init__(content_path: str, full_local_path: str, source_path: str = None, s3_last_modified: datetime = None, s3_file_size: int = None, bucket_object=None, local_last_modified: datetime = None, local_file_size: int = None)[source]

basename()[source]

full_path(base_path)[source]