monkey_wrench.chimp._models module

class monkey_wrench.chimp._models.ChimpRetrieval(*, temp_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], model_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)], sub_strings: str | list[str] | None = None, case_sensitive: bool = True, match_all: bool = True, parent_input_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], visitor_writer: ~monkey_wrench.input_output._models.Writer | None = None, visitor_callback: ~typing.Annotated[~typing.Callable[[...], ~typing.Any], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~typing.Any] | None = None, reverse: bool = False, recursive: bool = True, post_visit_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None, end_datetime: ~typing.Annotated[~pydantic.types.AwareDatetime, ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.date_time.models._base.<lambda>)] | None = None, start_datetime: ~typing.Annotated[~pydantic.types.AwareDatetime, ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.date_time.models._base.<lambda>)] | None = None, parent_output_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], datetime_format_string: str = '%Y/%m/%d', reset_child_datetime_directory: bool = False, device: ~typing.Literal['cpu', 'cuda'] = 'cpu', sequence_length: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 16, temporal_overlap: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 0, tile_size: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 256, verbose: bool = True)[source]

Bases: DateTimeDirectory, DateTimePeriod, DirectoryVisitor, ModelFile, TempDirectory

Pydantic model for CHIMP retrievals.

device: Literal['cpu', 'cuda']

sequence_length: Annotated[int, Ge(ge=0)]

temporal_overlap: Annotated[int, Ge(ge=0)]

tile_size: Annotated[int, Gt(gt=0)]

verbose: bool

run_in_batches() → None[source]: Perform CHIMP retrievals in batches.

run_for_single_batch() → None[source]: Perform a single CHIMP retrieval for a single batch.

__input_filepaths_as_strings(batch: list[Annotated[Path, PathType(path_type=file)]]) → list[str]: Convert paths to strings and ensure each batch includes the same number of items as sequence length.

__run_for_single_batch(batch: list[Annotated[Path, PathType(path_type=file)]], retrieve_function: Callable) → None: Helper function to perform a single CHIMP retrieval for a single batch.

datetime_format_string: str: The format string to create subdirectories from the datetime object. Defaults to "%Y/%m/%d".

reset_child_datetime_directory: bool

Whether to remove the (child) directory first if it already exists. Defaults to False.

This might save us from issues regarding files being overwritten and corrupted.

parent_output_directory_path: ExistingDirectoryPath

start_datetime: AwarePastDateTime | None

end_datetime: AwarePastDateTime | None

visitor_writer: Writer | None: If given, it will be used to write the list of visited files to a text file.

visitor_callback: TransformFunction[Any] | None: A function that will be called every time a match is found for a file. Defaults to None.

reverse: bool

A boolean to determine whether to sort the files in reverse order.

Defaults to False, which means sorting is in the alphabetical order.

recursive: bool

Determines whether to recursively visit the directory tree. or just visit the top-level directory.

Defaults to True.

post_visit_transform_function: TransformFunction[ReturnType] | None

The transform function that will be applied on filepaths after visiting them.

Defaults to None, which means no transformation is applied.

Note

If it is provided, the result of transformation will be returned instead of filepaths.

parent_input_directory_path: ExistingDirectoryPath

sub_strings: str | list[str] | None

The sub-strings to look for. It can be either a single string, a list of strings, or None..

Defaults to None, which means exists_in() returns True.

case_sensitive: bool: A boolean indicating whether to perform a case-sensitive match. Defaults to True.

match_all: bool

A boolean indicating whether to match all or any of the sub-strings. Defaults to True.

When it is set to False, only one match suffices. In the case of a single sub-string this parameter does not have any effect.

model_filepath: ExistingFilePath

temp_directory_path: ExistingDirectoryPath: The path to an existing directory, which will be used as the top-level temporary directory.

Note

This directory will be used as a parent directory for subsequent (child) temporary directories. As a result, it will not be removed or cleaned up. However, the child temporary directories will always be removed and cleaned up.

Note

If it is not set (i.e. it is None), it takes on a value according to the following order of priority:

1- The value of the TMPDIR environment variable.

2- /tmp/.