bi_etl.boto3_helper.s3 module
Created on March 28, 2022
@author:
- pydantic model bi_etl.boto3_helper.s3.Boto3_S3[source]
Bases:
S3_Bucket
- Config:
validate_default: bool = True
validate_assignment: bool = True
validate_credentials: bool = True
- Fields:
- Validators:
check_model
»all fields
- field keepass: KeepassConfig | None = None
If the password_source is KEEPASS, then load a sub-section with the
config_wrangler.config_templates.keepass_config.KeepassConfig
) settings- Validated by:
check_model
- field keepass_config: str | None = 'keepass'
If the password_source is KEEPASS, then which root level config item contains the settings for Keepass (must be an instance of
config_wrangler.config_templates.keepass_config.KeepassConfig
)- Validated by:
check_model
- field keepass_group: str | None = None
If the password_source is KEEPASS, which group in the Keepass database should be searched for an entry with a matching entry.
If is None, then the KeepassConfig.default_group value will be checked. If that is also None, then a ValueError will be raised.
- Validated by:
check_model
- field keepass_title: str | None = None
If the password_source is KEEPASS, this is an optional filter on the title of the keepass entries in the group.
- Validated by:
check_model
- field keyring_section: str | None = None
If the password_source is KEYRING, then which section (AKA system) should this module look for the password in.
See https://pypi.org/project/keyring/ or https://github.com/jaraco/keyring
- Validated by:
check_model
- field password_source: PasswordSourceValidated | None = None
The source to use when getting a password for the user. See
PasswordSource
for valid values.- Validated by:
check_model
- field raw_password: str | None = None
This is only used for the extremely non-secure CONFIG_FILE password source. The password is stored directly in the config file next to the user_id with the setting name raw_password
- Validated by:
check_model
- field validate_password_on_load: bool = True
Should config_wrangler query the password source for this password at load (startup) time? If so, it will raise an error if the password is None or an empty string. It does not actually connect or authenticate the user_id & password combination.
- Validated by:
check_model
- class CompareResult(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
- DIFFERENT_SIZE = 1
- LOCAL_NEWER = 2
- LOCAL_OLDER = 4
- SAME_TIMES = 3
- class OverwriteModes(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
- ALWAYS_OVERWRITE = 1
- NEVER_OVERWRITE = 3
- OVERWRITE_OLDER = 2
- __init__(**data: Any) None
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
Uses something other than self the first arg to allow “self” as a settable attribute
- add_child(name: str, child_object: ConfigHierarchy)
Set this configuration as a child in the hierarchy of another config. For any programmatically created config objects this is required so that the new object ‘knows’ where it lives in the hierarchy – most importantly so that it can find the hierarchies root object.
- check_config_hierarchy(**kwargs)
- validator check_model » all fields
- copy(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, update: Dict[str, Any] | None = None, deep: bool = False) Model
Returns a copy of the model.
- !!! warning “Deprecated”
This method is now deprecated; use model_copy instead.
If you need include or exclude, use:
`py data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `
- Parameters:
include¶ – Optional set or mapping specifying which fields to include in the copied model.
exclude¶ – Optional set or mapping specifying which fields to exclude in the copied model.
update¶ – Optional dictionary of field-value pairs to override field values in the copied model.
deep¶ – If True, the values of fields that are Pydantic models will be deep-copied.
- Returns:
A copy of the model with included, excluded and updated fields as specified.
- delete(key: str | PurePosixPath | None = None, version_id: str = None)
- delete_by_key(key: str | PurePosixPath, version_id: str = None)
- dict(*, include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) Dict[str, Any]
- download_file(*, local_filename: str | Path, key: str | PurePosixPath | None = None, extra_args: dict | None = None, transfer_config: TransferConfig | None = None, create_parents: bool = True, overwrite_mode: OverwriteModes = OverwriteModes.OVERWRITE_OLDER, _bucket_object_summary: ObjectSummary = None)
- download_file_from_s3(file_key: str, local_file_path: str, date_placeholder: str = None, date_pattern: str = '%Y-%m-%d') str [source]
- static download_file_list_from_s3(file_list: List[File_Entry])[source]
- download_files(*, local_path: str | Path, key: str | PurePosixPath | None = None, extra_args: dict | None = None, transfer_config: TransferConfig | None = None, create_parents: bool = True, overwrite_mode: OverwriteModes = OverwriteModes.OVERWRITE_OLDER) Iterable[Path]
- download_files_from_s3(key_prefixes: List[str], local_folder: str, changed_only=True, return_full_list=True) List[File_Entry] [source]
- exists(key: str | PurePosixPath | None = None) bool
- static filter_list_changed_vs_local(file_list: List[File_Entry]) List[File_Entry] [source]
- find_objects(key: str | PurePosixPath = None) BucketObjectsCollection
- classmethod from_orm(obj: Any) Model
- full_item_name(item_name: str = None, delimiter: str = ' -> ')
The fully qualified name of this config item in the config hierarchy.
- get(section, item, fallback=Ellipsis)
Used as a drop in replacement for ConfigParser.get() with dynamic config field names (using a string variable for the section and item names instead of python code attribute access)
Warning
With this method Python code checkers (linters) will not warn about invalid config items. You can end up with runtime AttributeError errors.
- get_boto3_bucket() Bucket
- get_bucket_region() str
Get the region_name from the actual S3 bucket definition.
Note
This can differ from the region_name attribute specified in the init call to this class or the config file that loads it. The region_name attribute is used for establishing the AWS session. get_bucket_region() is used to find out in which region the data is stored.
- get_copy(copied_by: str = 'get_copy') AWS_Session
Copy this configuration. Useful when you need to programmatically modify a configuration without modifying the original base configuration.
- get_list(section, item, fallback=Ellipsis) list
Used as a drop in replacement for ConfigParser.get() + list parsing with dynamic config field names (using a string variable for the section and item names instead of python code attribute access) that is then parsed as a list.
Warning
With this method Python code checkers (linters) will not warn about invalid config items. You can end up with runtime AttributeError errors.
- get_object(key: str | PurePosixPath | None = None) Object
- get_object_uncached(key: str | PurePosixPath | None = None) Object
- get_password() str
Get the password for this resource. password_source controls where it looks for the password. If that is None, then the root level passwords container is checked for password_source value.
- get_secrets_manager()
- get_ssm()
- getboolean(section, item, fallback=Ellipsis) bool
Used as a drop in replacement for ConfigParser.getboolean() with dynamic config field names (using a string variable for the section and item names instead of python code attribute access)
Warning
With this method Python code checkers (linters) will not warn about invalid config items. You can end up with runtime AttributeError errors.
- is_file()
- is_relative_to(other: S3_Bucket)
- iterdir() Iterable[S3_Bucket_Key]
Return the S3_Bucket_Key objects contained in the in/under this object.
- joinpath(*others) S3_Bucket_Key | S3_Bucket_Folder
- json(*, include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = PydanticUndefined, models_as_dict: bool = PydanticUndefined, **dumps_kwargs: Any) str
- key_exists(key: str | PurePosixPath) bool
- list_object_paths(key: str | PurePosixPath | None = None) List[PurePosixPath]
Return the relative paths of objects contained in the in/under this object or, if provided, under the object + provided key parameter. :param _sphinx_paramlinks_bi_etl.boto3_helper.s3.Boto3_S3.list_object_paths.key:
The
- list_objects(key: str | PurePosixPath | None = None) BucketObjectsCollection
- load_into_table_from_s3(database: DatabaseMetadata, table_name: str = None, filename: str = None, delimiter: str = None, region: str = None)[source]
- classmethod model_construct(_fields_set: set[str] | None = None, **values: Any) Model
Creates a new instance of the Model class with validated data.
Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values
- model_copy(*, update: dict[str, Any] | None = None, deep: bool = False) Model
Usage docs: https://docs.pydantic.dev/2.6/concepts/serialization/#model_copy
Returns a copy of the model.
- model_dump(*, mode: Literal['json', 'python'] | str = 'python', include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, round_trip: bool = False, warnings: bool = True) dict[str, Any]
Usage docs: https://docs.pydantic.dev/2.6/concepts/serialization/#modelmodel_dump
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- Parameters:
mode¶ – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include¶ – A list of fields to include in the output.
exclude¶ – A list of fields to exclude from the output.
by_alias¶ – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset¶ – Whether to exclude fields that have not been explicitly set.
exclude_defaults¶ – Whether to exclude fields that are set to their default value.
exclude_none¶ – Whether to exclude fields that have a value of None.
round_trip¶ – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings¶ – Whether to log warnings when invalid fields are encountered.
- Returns:
A dictionary representation of the model.
- model_dump_json(*, indent: int | None = None, include: IncEx = None, exclude: IncEx = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, round_trip: bool = False, warnings: bool = True) str
Usage docs: https://docs.pydantic.dev/2.6/concepts/serialization/#modelmodel_dump_json
Generates a JSON representation of the model using Pydantic’s to_json method.
- Parameters:
indent¶ – Indentation to use in the JSON output. If None is passed, the output will be compact.
include¶ – Field(s) to include in the JSON output.
exclude¶ – Field(s) to exclude from the JSON output.
by_alias¶ – Whether to serialize using field aliases.
exclude_unset¶ – Whether to exclude fields that have not been explicitly set.
exclude_defaults¶ – Whether to exclude fields that are set to their default value.
exclude_none¶ – Whether to exclude fields that have a value of None.
round_trip¶ – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings¶ – Whether to log warnings when invalid fields are encountered.
- Returns:
A JSON string representation of the model.
- model_dump_non_private(*, mode: Literal['json', 'python'] | str = 'python', exclude: Set[str] = None) dict[str, Any]
- classmethod model_json_schema(by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation') dict[str, Any]
Generates a JSON schema for a model class.
- Parameters:
- Returns:
The JSON schema for the given model class.
- classmethod model_parametrized_name(params: tuple[type[Any], ...]) str
Compute the class name for parametrizations of generic classes.
This method can be overridden to achieve a custom naming scheme for generic BaseModels.
- Parameters:
params¶ – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
- Returns:
String representing the new class where params are passed to cls as type variables.
- Raises:
TypeError – Raised when trying to generate concrete names for non-generic models.
- model_post_init(_ModelMetaclass__context: Any) None
We need to both initialize private attributes and call the user-defined model_post_init method.
- classmethod model_rebuild(*, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: dict[str, Any] | None = None) bool | None
Try to rebuild the pydantic-core schema for the model.
This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.
- Parameters:
- Returns:
Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.
- classmethod model_validate(obj: Any, *, strict: bool | None = None, from_attributes: bool | None = None, context: dict[str, Any] | None = None) Model
Validate a pydantic model instance.
- Parameters:
- Raises:
ValidationError – If the object could not be validated.
- Returns:
The validated model instance.
- classmethod model_validate_json(json_data: str | bytes | bytearray, *, strict: bool | None = None, context: dict[str, Any] | None = None) Model
Usage docs: https://docs.pydantic.dev/2.6/concepts/json/#json-parsing
Validate the given JSON data against the Pydantic model.
- Parameters:
- Returns:
The validated Pydantic model.
- Raises:
ValueError – If json_data is not a JSON string.
- classmethod model_validate_strings(obj: Any, *, strict: bool | None = None, context: dict[str, Any] | None = None) Model
Validate the given object contains string data against the Pydantic model.
- classmethod parse_file(path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False) Model
- classmethod parse_obj(obj: Any) Model
- classmethod parse_raw(b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False) Model
- scan_files_from_s3(key_prefixes: List[str], local_folder: str | None = None) List[File_Entry] [source]
- classmethod schema_json(*, by_alias: bool = True, ref_template: str = '#/$defs/{model}', **dumps_kwargs: Any) str
- set_session(session: Session)
- sts_assume_role(role_arn: str, role_session_name: str, policy_arns: Sequence[PolicyDescriptorTypeTypeDef] = Ellipsis, policy: str = Ellipsis, duration_seconds: int = Ellipsis, tags: Sequence[TagTypeDef] = Ellipsis, transitive_tag_keys: Sequence[str] = Ellipsis, external_id: str = Ellipsis, serial_number: str = Ellipsis, token_code: str = Ellipsis, source_identity: str = Ellipsis, provided_contexts: Sequence[TagTypeDef] = Ellipsis) Session
- static translate_config_data(config_data: MutableMapping)
Children classes can provide translation logic to allow older config files to be used with newer config class definitions.
- unlink(missing_ok=False)
- unload_data(database: DatabaseMetadata, file_format: FileFormat = None, query: str = None, out_folder: str = None, filename: str = None, delimiter: str = None)[source]
- upload_file(*, local_filename: str | Path, key: str | PurePosixPath | None = None, extra_args: dict | None = None, transfer_config: TransferConfig | None = None, overwrite_mode: OverwriteModes = OverwriteModes.ALWAYS_OVERWRITE)
- classmethod validate(value: Any) Model
- property client: S3Client
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- property model_extra: dict[str, Any] | None
Get extra fields set during validation.
- Returns:
A dictionary of extra fields, or None if config.extra is not set to “allow”.
- property model_fields_set: set[str]
Returns the set of fields that have been explicitly set on this model instance.
- Returns:
- A set of strings representing the fields that have been set,
i.e. that were not filled from defaults.
- property name
- property parent
- property parents
- property parts
- property resource: S3ServiceResource
- property session: Session
- property stem
- property suffix
- property suffixes
- class bi_etl.boto3_helper.s3.FileFormat(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
- csv = 'C'
- parquet = 'P'
- class bi_etl.boto3_helper.s3.File_Entry(content_path: str, full_local_path: str, source_path: str = None, s3_last_modified: datetime = None, s3_file_size: int = None, bucket_object=None, local_last_modified: datetime = None, local_file_size: int = None)[source]
Bases:
object