monkey_wrench.chimp._models module

class monkey_wrench.chimp._models.ChimpRetrieval(*, temp_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], model_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)], sub_strings: str | list[str] | None = None, case_sensitive: bool = True, match_all: bool = True, parent_input_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], visitor_writer: ~monkey_wrench.input_output._models.Writer | None = None, visitor_callback: ~typing.Annotated[~typing.Callable[[...], ~typing.Any], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~typing.Any] | None = None, reverse: bool = False, recursive: bool = True, post_visit_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None, end_datetime: ~typing.Annotated[~pydantic.types.AwareDatetime, ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.date_time.models._base.<lambda>)] | None = None, start_datetime: ~typing.Annotated[~pydantic.types.AwareDatetime, ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.date_time.models._base.<lambda>)] | None = None, parent_output_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], datetime_format_string: str = '%Y/%m/%d', reset_child_datetime_directory: bool = False, device: ~typing.Literal['cpu', 'cuda'] = 'cpu', sequence_length: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 16, temporal_overlap: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 0, tile_size: ~typing.Annotated[int, ~annotated_types.Gt(gt=0)] = 256, verbose: bool = True)[source]

Bases: DateTimeDirectory, DateTimePeriod, DirectoryVisitor, ModelFile, TempDirectory

Pydantic model for CHIMP retrievals.

device: Literal['cpu', 'cuda']
sequence_length: Annotated[int, Ge(ge=0)]
temporal_overlap: Annotated[int, Ge(ge=0)]
tile_size: Annotated[int, Gt(gt=0)]
verbose: bool
run_in_batches() None[source]

Perform CHIMP retrievals in batches.

run_for_single_batch() None[source]

Perform a single CHIMP retrieval for a single batch.

__input_filepaths_as_strings(batch: list[Annotated[Path, PathType(path_type=file)]]) list[str]

Convert paths to strings and ensure each batch includes the same number of items as sequence length.

__run_for_single_batch(batch: list[Annotated[Path, PathType(path_type=file)]], retrieve_function: Callable) None

Helper function to perform a single CHIMP retrieval for a single batch.

datetime_format_string: str

The format string to create subdirectories from the datetime object. Defaults to "%Y/%m/%d".

reset_child_datetime_directory: bool

Whether to remove the (child) directory first if it already exists. Defaults to False.

This might save us from issues regarding files being overwritten and corrupted.

parent_output_directory_path: ExistingDirectoryPath
start_datetime: AwarePastDateTime | None
end_datetime: AwarePastDateTime | None
visitor_writer: Writer | None

If given, it will be used to write the list of visited files to a text file.

visitor_callback: TransformFunction[Any] | None

A function that will be called every time a match is found for a file. Defaults to None.

reverse: bool

A boolean to determine whether to sort the files in reverse order.

Defaults to False, which means sorting is in the alphabetical order.

recursive: bool

Determines whether to recursively visit the directory tree. or just visit the top-level directory.

Defaults to True.

post_visit_transform_function: TransformFunction[ReturnType] | None

The transform function that will be applied on filepaths after visiting them.

Defaults to None, which means no transformation is applied.

Note

If it is provided, the result of transformation will be returned instead of filepaths.

parent_input_directory_path: ExistingDirectoryPath
sub_strings: str | list[str] | None

The sub-strings to look for. It can be either a single string, a list of strings, or None..

Defaults to None, which means exists_in() returns True.

case_sensitive: bool

A boolean indicating whether to perform a case-sensitive match. Defaults to True.

match_all: bool

A boolean indicating whether to match all or any of the sub-strings. Defaults to True.

When it is set to False, only one match suffices. In the case of a single sub-string this parameter does not have any effect.

model_filepath: ExistingFilePath
temp_directory_path: ExistingDirectoryPath

The path to an existing directory, which will be used as the top-level temporary directory.

Note

This directory will be used as a parent directory for subsequent (child) temporary directories. As a result, it will not be removed or cleaned up. However, the child temporary directories will always be removed and cleaned up.

Note

If it is not set (i.e. it is None), it takes on a value according to the following order of priority:

1- The value of the TMPDIR environment variable.

2- /tmp/.