monkey_wrench.input_output package

The package providing utilities for input and output operations.

class monkey_wrench.input_output.DateTimeDirectory(*, parent_output_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], datetime_format_string: str = '%Y/%m/%d', reset_child_datetime_directory: bool = False)[source]

Bases: ParentOutputDirectory

Pydantic model for datetime directories needed to store products and the input/output of CHIMP.

create_datetime_directory(datetime_object: datetime) Path[source]

Create a directory based on the datetime object.

Parameters:

datetime_object – The datetime object to create the directory for.

Returns:

The full path of the (created) directory.

Example

>>> path = DateTimeDirectory(
...  datetime_format_string="%Y/%m/%d",
...  parent_output_directory_path=Path.home()
... ).create_datetime_directory(
...  datetime(2022, 3, 12)
... )
>>> expected_path = Path.home() / Path("2022/03/12")
>>> expected_path.exists()
True
>>> expected_path == path
True
get_datetime_directory(datetime_object: datetime) Path[source]

Get the full path to the datetime directory (given the datetime object). This does not create the directory.

Parameters:

datetime_object – The datetime object for which the full directory path will be returned.

Returns:

The full path of the datetime directory.

Example

>>> path = DateTimeDirectory(
...  datetime_format_string="%Y/%m/%d",
...  parent_output_directory_path=Path.home()
... ).get_datetime_directory(
...  datetime(2022, 3, 12)
... )
>>> expected_path = Path.home() / Path("2022/03/12")
>>> expected_path == path
True
datetime_format_string: str

The format string to create subdirectories from the datetime object. Defaults to "%Y/%m/%d".

reset_child_datetime_directory: bool

Whether to remove the (child) directory first if it already exists. Defaults to False.

This might save us from issues regarding files being overwritten and corrupted.

class monkey_wrench.input_output.DatasetSaveOptions(*, dataset_save_options: dict[str, bool | str | int] = {'include_lonlats': False, 'writer': 'cf'})[source]

Bases: Model

Pydantic model for the storage options using which the dataset is to be saved. This is dataset-dependent.

dataset_save_options: dict[str, bool | str | int]

A dictionary which includes the actual storage options.

The default behaviour is to use cf as the writer and exclude longitude and latitude values, i.e. dataset_save_options = dict(writer="cf", include_lonlats=False)

class monkey_wrench.input_output.DirectoryVisitor(*, sub_strings: str | list[str] | None = None, case_sensitive: bool = True, match_all: bool = True, parent_input_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], visitor_writer: ~monkey_wrench.input_output._models.Writer | None = None, visitor_callback: ~typing.Annotated[~typing.Callable[[...], ~typing.Any], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~typing.Any] | None = None, reverse: bool = False, recursive: bool = True, post_visit_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None)[source]

Bases: ParentInputDirectory, Pattern

Pydantic model for visiting files in a directory tree.

__collect_files() list[Path]
visit() list[ReturnType] | list[Path][source]

Visit all files in the directory, either recursively or just the top-level files.

Returns:

A sorted flat list of all file paths in the given directory that match the given pattern and have been treated according to the visitor_callback function. If the post_visit_transform_function is provided , a list of transformed filepaths will be returned instead.

visitor_writer: Writer | None

If given, it will be used to write the list of visited files to a text file.

visitor_callback: Annotated[Callable[[...], Any], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], Any] | None

A function that will be called every time a match is found for a file. Defaults to None.

reverse: bool

A boolean to determine whether to sort the files in reverse order.

Defaults to False, which means sorting is in the alphabetical order.

recursive: bool

Determines whether to recursively visit the directory tree. or just visit the top-level directory.

Defaults to True.

post_visit_transform_function: Annotated[Callable[[...], ReturnType], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], ReturnType] | None

The transform function that will be applied on filepaths after visiting them.

Defaults to None, which means no transformation is applied.

Note

If it is provided, the result of transformation will be returned instead of filepaths.

class monkey_wrench.input_output.ExistingInputFile(*, input_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)])[source]

Bases: Model

Pydantic model for an input file which must exist.

Example

A text file which includes the list of product IDs which have been already fetched. This file will be used to fetch the product files.

input_filepath: ensure_path_does_not_end_with_slash)]
class monkey_wrench.input_output.FilesIntegrityValidator(*, number_of_processes: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 1, nominal_file_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] | None = None, file_size_relative_tolerance: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.01, filepath_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None, reference_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None, reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None)[source]

Bases: MultiProcess

Pydantic model to verify files integrity by checking their size and comparing their list against a reference.

Note

This class does two main verifications, namely checking for corrupted and missing files as follows

1- Checking that the file sizes are within some threshold from a nominal file size. 2- Checking the list of filepaths against a reference list.

__get_reference_items(reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) Any
file_is_corrupted(file_size: Annotated[int, Ge(ge=0)]) bool[source]
find_corrupted_files(filepaths: list[Path] | set[Path] | tuple[Path, ...]) set[Path] | None[source]
find_missing_files(filepaths: list[~pathlib.Path] | set[~pathlib.Path] | tuple[~pathlib.Path, ...], reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) set[Path] | None[source]
static get_reference_items(reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) list[InputType] | set[InputType] | tuple[InputType, ...] | None[source]

Return the reference items.

transform_files(filepaths: list[Path] | set[Path] | tuple[Path, ...]) set[Path] | None[source]
verify_files(filepaths: list[~pathlib.Path] | set[~pathlib.Path] | tuple[~pathlib.Path, ...], reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) tuple[set[InputType] | set[ReturnType] | None, set[Path] | None][source]

Check for missing and corrupted files.

nominal_file_size: Annotated[int, Ge(ge=0)] | None

The nominal size of files in bytes. This is used to check for corrupted files.

Defaults to None, which means the search for corrupted files will not be performed.

file_size_relative_tolerance: Annotated[float, Ge(ge=0)]

The maximum relative difference in the size of a file, before it can be marked as corrupted.

Defaults to 0.01, i.e. any file whose size differs by more than one percent from the nominal size, will be marked as corrupted.

filepath_transform_function: Annotated[Callable[[...], ReturnType], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], ReturnType] | None

A function to transform the file paths into other types of objects before comparing them against the reference.

This can be e.g. a parse() function to make datetime objects out of file paths. Defaults to None which means no transformation is performed on the file paths and they will be used as they are.

reference_transform_function: Annotated[Callable[[...], ReturnType], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], ReturnType] | None

A function to transform the reference items into other types of objects before using them for comparison.

This can be e.g. parse() to make datetime objects out of SEVIRI product IDs. Defaults to None which means no transformation is performed on the reference items and they will be used as they are.

reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None

Reference items to compare against, used in finding the missing files.

It can be a list/set/tuple of items, or a filepath from which the reference items can be read, or a directory visitor which can collect the reference files.

Defaults to None which means the search for missing files will not be performed.

class monkey_wrench.input_output.FsSpecCache(*, fsspec_cache: Literal['filecache', 'blockcache'] | None = None)[source]

Bases: Model

Pydantic model for the caching scheme of fsspec.

Note

See fsspec cache, to learn more about buffering and random access in fsspec.

property fsspec_cache_str

Return the cache string with a leading :: if it is not None. Otherwise, return an empty string.

fsspec_cache: Literal['filecache', 'blockcache'] | None

How to buffer, e.g. "filecache", "blockcache", or None. Defaults to None.

Warning

None might cause too many requests to be sent to the server!

class monkey_wrench.input_output.ExistingInputDirectory(*, input_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]

Bases: Model

Pydantic model for an input directory which must exist.

Note

This model is to be solely used for a flat structure. If you have a hierarchical tree structure, use ParentInputDirectory instead to be more clear about the directory structure.

input_directory: <lambda>)]
class monkey_wrench.input_output.InputFile(*, input_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | None = None)[source]

Bases: Model

Pydantic model for an input file which does not necessarily exist during the model validation.

input_filepath: ensure_path_does_not_end_with_slash)] | None
class monkey_wrench.input_output.ModelFile(*, model_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)])[source]

Bases: Model

Pydantic model for a model file which must exist.

Example

A *.pt file used by CHIMP, as the model, to perform a retrieval.

model_filepath: ensure_path_does_not_end_with_slash)]
class monkey_wrench.input_output.NewOutputFile(*, output_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)])[source]

Bases: Model

Pydantic mode for an output file which must not already exist.

Example

A text file to store the result of visiting a directory, i.e. collected files that match the determined pattern.

output_filepath: ensure_path_does_not_end_with_slash)]
class monkey_wrench.input_output.ExistingOutputDirectory(*, output_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]

Bases: Model

Pydantic model for an output directory which must exist.

Note

This model is to be solely used for a flat structure. If you have a hierarchical tree structure, use ParentOutputDirectory instead to be more clear about the directory structure.

output_directory: <lambda>)]
class monkey_wrench.input_output.OutputFile(*, output_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | None = None)[source]

Bases: Model

Pydantic model for an output file which does not necessarily exist during the model validation.

output_filepath: ensure_path_does_not_end_with_slash)] | None
class monkey_wrench.input_output.Reader(*, input_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)], post_reading_transformation: ~monkey_wrench.generic.models._pattern.StringTransformation = StringTransformation(trim=True, transform_function=None))[source]

Bases: ExistingInputFile

Pydantic model for an ASCII file (text mode) reader.

read() list[Any][source]

Read items from a text file, assuming each line corresponds to a single item.

Examples of items are product IDs.

Warning

This function does not check whether the items are valid or not. It is a simple convenience function for reading items from a text file.

Returns:

A list of (transformed) items, where each item corresponds to a single line in the given file.

post_reading_transformation: StringTransformation

The transformation after reading items from the file and before returning them.

Defaults to StringTransformation(), which means the items will be only trimmed.

Note

The items will be first trimmed and then transformed according to post_reading_transformation.transform_function.

class monkey_wrench.input_output.TempDirectory(*, temp_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]

Bases: Model

Pydantic model for a temporary directory, including a context manager.

temp_dir_context_manager() Generator[Path, None, None][source]

Context manager to create a temporary directory and also set the global temporary directory to the same path.

Note

The temporary directory created by this context manager will reside inside TempDirectory.temp_directory_path.

Note

The reason to set the global temporary directory is to ensure that any other inner functions or context managers that might invoke tempfile.TemporaryDirectory() also use the given global temporary directory.

Yields:

The full path of the (created) temporary directory.

classmethod validate_temporary_directory(data: Any) Any[source]

Return the path to the top-level temporary directory according to the priority rules.

temp_directory_path: <lambda>)]

The path to an existing directory, which will be used as the top-level temporary directory.

Note

This directory will be used as a parent directory for subsequent (child) temporary directories. As a result, it will not be removed or cleaned up. However, the child temporary directories will always be removed and cleaned up.

Note

If it is not set (i.e. it is None), it takes on a value according to the following order of priority:

1- The value of the TMPDIR environment variable.

2- /tmp/.

class monkey_wrench.input_output.ParentInputDirectory(*, parent_input_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]

Bases: Model

Pydantic model for the top-level directory where the child (input) directories reside. The directory must exist.

Example

A directory which includes all SEVIRI files that have to be reprocessed using CHIMP.

parent_input_directory_path: <lambda>)]
class monkey_wrench.input_output.ParentOutputDirectory(*, parent_output_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]

Bases: Model

Pydantic model for the top-level directory where the child (output) directories reside. The directory must exist.

Example

A directory which the output of CHIMP will be saved.

parent_output_directory_path: <lambda>)]
class monkey_wrench.input_output.Writer(*, output_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | None = None, open_mode: ~typing.Literal['w', 'a'] = 'w', pre_writing_transformation: ~monkey_wrench.generic.models._pattern.StringTransformation = StringTransformation(trim=True, transform_function=None), on_write_catch_exceptions: tuple[type[Exception], ...] | None = None)[source]

Bases: OutputFile

Pydantic model for an ASCII file (text mode) writer.

__get_open_mode(open_mode: Literal['w', 'a'] | None = None)

Return the open_mode if it is not None. Otherwise return the value of self.open_mode.

prepare_output_file_for_writing(open_mode: Literal['w', 'a'] | None = None) None[source]

Prepare the output file depending on whether the file exists and the value of open_mode.

Note

  • If the file exists and open_mode is "a", the file will be left as it is.

  • If the file exists and open_mode is "w", the file will be overwritten.

  • If the file does not exist, the file will be created, regardless of open_mode.

Parameters:

open_mode – Defaults to None, which means the value from self.open_mode will be used.

write(items: list[ElementType] | set[ElementType] | tuple[ElementType, ...] | Generator[Any, None, None], open_mode: Literal['w', 'a'] | None = None) Annotated[int, Ge(ge=0)][source]

Write items from a list/set/tuple or a generator to a text file, with one item per line.

Examples of items are product IDs.

This function opens a text file in the (over)write or append mode. It then writes each item from the provided iterable to the file. It catches specified exceptions during the writing process, and logs them as warnings.

Parameters:
  • items – An iterable of items to be written to the file.

  • open_mode – Defaults to None, which means the value from self.open_mode will be used.

Returns:

The number of items that are written to the file successfully.

write_in_batches(batches: Generator[tuple[T, int], None, None], open_mode: Literal['w', 'a'] | None = None) Annotated[int, Ge(ge=0)][source]

Similar to Writer.write(), but assumes that the input is in batches.

open_mode: Literal['w', 'a']

The mode using which the text file will be opened. Defaults to "w".

pre_writing_transformation: StringTransformation

The transformation before writing items to the file.

Defaults to StringTransformation, which means the items will be only trimmed.

Note

The items will first be transformed according to the pre_writing_transformation.transform_function, and then trimmed.

on_write_catch_exceptions: tuple[type[Exception], ...] | None

Exceptions which will be caught and logged as warnings.

Defaults to None, which means all exceptions will be caught. If it is a tuple, only the given exceptions will be caught. As a result, in the case of an empty tuple, no exceptions will be caught.

monkey_wrench.input_output.copy_files_between_directories(source_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], destination_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], pattern: ~monkey_wrench.generic.models._pattern.Pattern | None = None) None[source]

Copy (top-level) files whose names include the pattern from one directory to another.

Warning

The copying is not performed recursively. Only the top-level files are copied.

Parameters:
  • source_directory – The source directory to copy files from.

  • destination_directory – The destination directory to copy files to.

  • pattern – The pattern to filter the files.

monkey_wrench.input_output.copy_single_file_to_directory(destination_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)]) None[source]

Copy a single file with the given path to another destination directory.

Parameters:
  • destination_directory – The destination directory to copy the given file to.

  • filepath – The path of the file that needs to be copied.

Subpackages

Submodules