monkey_wrench.input_output._models module
- class monkey_wrench.input_output._models.ExistingInputFile(*, input_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)])[source]
Bases:
ModelPydantic model for an input file which must exist.
Example
A text file which includes the list of product IDs which have been already fetched. This file will be used to fetch the product files.
- input_filepath: ensure_path_does_not_end_with_slash)]
- class monkey_wrench.input_output._models.InputFile(*, input_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | None = None)[source]
Bases:
ModelPydantic model for an input file which does not necessarily exist during the model validation.
- input_filepath: ensure_path_does_not_end_with_slash)] | None
- class monkey_wrench.input_output._models.NewOutputFile(*, output_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)])[source]
Bases:
ModelPydantic mode for an output file which must not already exist.
Example
A text file to store the result of visiting a directory, i.e. collected files that match the determined pattern.
- output_filepath: ensure_path_does_not_end_with_slash)]
- class monkey_wrench.input_output._models.OutputFile(*, output_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=new), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)] | None = None)[source]
Bases:
ModelPydantic model for an output file which does not necessarily exist during the model validation.
- output_filepath: ensure_path_does_not_end_with_slash)] | None
- class monkey_wrench.input_output._models.ModelFile(*, model_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)])[source]
Bases:
ModelPydantic model for a model file which must exist.
Example
A
*.ptfile used by CHIMP, as the model, to perform a retrieval.- model_filepath: ensure_path_does_not_end_with_slash)]
- class monkey_wrench.input_output._models.ParentInputDirectory(*, parent_input_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]
Bases:
ModelPydantic model for the top-level directory where the child (input) directories reside. The directory must exist.
Example
A directory which includes all SEVIRI files that have to be reprocessed using CHIMP.
- parent_input_directory_path: <lambda>)]
- class monkey_wrench.input_output._models.ParentOutputDirectory(*, parent_output_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]
Bases:
ModelPydantic model for the top-level directory where the child (output) directories reside. The directory must exist.
Example
A directory which the output of CHIMP will be saved.
- parent_output_directory_path: <lambda>)]
- class monkey_wrench.input_output._models.ExistingInputDirectory(*, input_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]
Bases:
ModelPydantic model for an input directory which must exist.
Note
This model is to be solely used for a flat structure. If you have a hierarchical tree structure, use
ParentInputDirectoryinstead to be more clear about the directory structure.- input_directory: <lambda>)]
- class monkey_wrench.input_output._models.ExistingOutputDirectory(*, output_directory: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)])[source]
Bases:
ModelPydantic model for an output directory which must exist.
Note
This model is to be solely used for a flat structure. If you have a hierarchical tree structure, use
ParentOutputDirectoryinstead to be more clear about the directory structure.- output_directory: <lambda>)]
- class monkey_wrench.input_output._models.FsSpecCache(*, fsspec_cache: Literal['filecache', 'blockcache'] | None = None)[source]
Bases:
ModelPydantic model for the caching scheme of fsspec.
Note
See fsspec cache, to learn more about buffering and random access in fsspec.
- fsspec_cache: Literal['filecache', 'blockcache'] | None
How to buffer, e.g.
"filecache","blockcache", orNone. Defaults toNone.Warning
Nonemight cause too many requests to be sent to the server!
- property fsspec_cache_str
Return the cache string with a leading
::if it is notNone. Otherwise, return an empty string.
- class monkey_wrench.input_output._models.DatasetSaveOptions(*, dataset_save_options: dict[str, bool | str | int] = {'include_lonlats': False, 'writer': 'cf'})[source]
Bases:
ModelPydantic model for the storage options using which the dataset is to be saved. This is dataset-dependent.
- dataset_save_options: dict[str, bool | str | int]
A dictionary which includes the actual storage options.
The default behaviour is to use
cfas the writer and exclude longitude and latitude values, i.e.dataset_save_options = dict(writer="cf", include_lonlats=False)
- class monkey_wrench.input_output._models.Reader(*, input_filepath: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.ensure_path_does_not_end_with_slash)], post_reading_transformation: ~monkey_wrench.generic.models._pattern.StringTransformation = StringTransformation(trim=True, transform_function=None))[source]
Bases:
ExistingInputFilePydantic model for an ASCII file (text mode) reader.
- post_reading_transformation: StringTransformation
The transformation after reading items from the file and before returning them.
Defaults to
StringTransformation(), which means the items will be only trimmed.Note
The items will be first trimmed and then transformed according to
post_reading_transformation.transform_function.
- read() list[Any][source]
Read items from a text file, assuming each line corresponds to a single item.
Examples of items are product IDs.
Warning
This function does not check whether the items are valid or not. It is a simple convenience function for reading items from a text file.
- Returns:
A list of (transformed) items, where each item corresponds to a single line in the given file.
- input_filepath: ExistingFilePath
- class monkey_wrench.input_output._models.DirectoryVisitor(*, sub_strings: str | list[str] | None = None, case_sensitive: bool = True, match_all: bool = True, parent_input_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], visitor_writer: ~monkey_wrench.input_output._models.Writer | None = None, visitor_callback: ~typing.Annotated[~typing.Callable[[...], ~typing.Any], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~typing.Any] | None = None, reverse: bool = False, recursive: bool = True, post_visit_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None)[source]
Bases:
ParentInputDirectory,PatternPydantic model for visiting files in a directory tree.
- visitor_writer: Writer | None
If given, it will be used to write the list of visited files to a text file.
- visitor_callback: Annotated[Callable[[...], Any], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], Any] | None
A function that will be called every time a match is found for a file. Defaults to
None.
- reverse: bool
A boolean to determine whether to sort the files in reverse order.
Defaults to
False, which means sorting is in the alphabetical order.
- recursive: bool
Determines whether to recursively visit the directory tree. or just visit the top-level directory.
Defaults to
True.
- post_visit_transform_function: Annotated[Callable[[...], ReturnType], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], ReturnType] | None
The transform function that will be applied on filepaths after visiting them.
Defaults to
None, which means no transformation is applied.Note
If it is provided, the result of transformation will be returned instead of filepaths.
- __collect_files() list[Path]
- visit() list[ReturnType] | list[Path][source]
Visit all files in the directory, either recursively or just the top-level files.
- Returns:
A sorted flat list of all file paths in the given directory that match the given pattern and have been treated according to the
visitor_callbackfunction. If thepost_visit_transform_functionis provided , a list of transformed filepaths will be returned instead.
- parent_input_directory_path: ExistingDirectoryPath
- sub_strings: str | list[str] | None
The sub-strings to look for. It can be either a single string, a list of strings, or
None..Defaults to
None, which meansexists_in()returnsTrue.
- case_sensitive: bool
A boolean indicating whether to perform a case-sensitive match. Defaults to
True.
- match_all: bool
A boolean indicating whether to match all or any of the sub-strings. Defaults to
True.When it is set to
False, only one match suffices. In the case of a single sub-string this parameter does not have any effect.
- class monkey_wrench.input_output._models.FilesIntegrityValidator(*, number_of_processes: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 1, nominal_file_size: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] | None = None, file_size_relative_tolerance: ~typing.Annotated[float, ~annotated_types.Ge(ge=0)] = 0.01, filepath_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None, reference_transform_function: ~typing.Annotated[~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType], ~pydantic.functional_validators.BeforeValidator(func=~monkey_wrench.generic.models._function.validate_function_path, json_schema_input_type=PydanticUndefined)] | ~typing.Callable[[...], ~monkey_wrench.input_output._models.ReturnType] | None = None, reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None)[source]
Bases:
MultiProcessPydantic model to verify files integrity by checking their size and comparing their list against a reference.
Note
This class does two main verifications, namely checking for corrupted and missing files as follows
1- Checking that the file sizes are within some threshold from a nominal file size. 2- Checking the list of filepaths against a reference list.
- nominal_file_size: Annotated[int, Ge(ge=0)] | None
The nominal size of files in bytes. This is used to check for corrupted files.
Defaults to
None, which means the search for corrupted files will not be performed.
- file_size_relative_tolerance: Annotated[float, Ge(ge=0)]
The maximum relative difference in the size of a file, before it can be marked as corrupted.
Defaults to
0.01, i.e. any file whose size differs by more than one percent from the nominal size, will be marked as corrupted.
- filepath_transform_function: Annotated[Callable[[...], ReturnType], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], ReturnType] | None
A function to transform the file paths into other types of objects before comparing them against the reference.
This can be e.g. a
parse()function to make datetime objects out of file paths. Defaults toNonewhich means no transformation is performed on the file paths and they will be used as they are.
- number_of_processes: NonNegativeInt
Number of process to use. Defaults to
1.A value of
1disables multiprocessing. This is useful for e.g. testing purposes.
- reference_transform_function: Annotated[Callable[[...], ReturnType], BeforeValidator(func=validate_function_path, json_schema_input_type=PydanticUndefined)] | Callable[[...], ReturnType] | None
A function to transform the reference items into other types of objects before using them for comparison.
This can be e.g.
parse()to make datetime objects out of SEVIRI product IDs. Defaults toNonewhich means no transformation is performed on the reference items and they will be used as they are.
- reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None
Reference items to compare against, used in finding the missing files.
It can be a list/set/tuple of items, or a filepath from which the reference items can be read, or a directory visitor which can collect the reference files.
Defaults to
Nonewhich means the search for missing files will not be performed.
- static get_reference_items(reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) list[InputType] | set[InputType] | tuple[InputType, ...] | None[source]
Return the reference items.
- __get_reference_items(reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) Any
- find_corrupted_files(filepaths: list[Path] | set[Path] | tuple[Path, ...]) set[Path] | None[source]
- find_missing_files(filepaths: list[~pathlib.Path] | set[~pathlib.Path] | tuple[~pathlib.Path, ...], reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) set[Path] | None[source]
- verify_files(filepaths: list[~pathlib.Path] | set[~pathlib.Path] | tuple[~pathlib.Path, ...], reference: list[~monkey_wrench.input_output._models.InputType] | set[~monkey_wrench.input_output._models.InputType] | tuple[~monkey_wrench.input_output._models.InputType, ...] | ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=file), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)] | ~monkey_wrench.input_output._models.DirectoryVisitor | None = None) tuple[set[InputType] | set[ReturnType] | None, set[Path] | None][source]
Check for missing and corrupted files.
- class monkey_wrench.input_output._models.DateTimeDirectory(*, parent_output_directory_path: ~typing.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir), ~pydantic.functional_validators.AfterValidator(func=~monkey_wrench.input_output._types.<lambda>)], datetime_format_string: str = '%Y/%m/%d', reset_child_datetime_directory: bool = False)[source]
Bases:
ParentOutputDirectoryPydantic model for datetime directories needed to store products and the input/output of CHIMP.
- parent_output_directory_path: ExistingDirectoryPath
- datetime_format_string: str
The format string to create subdirectories from the datetime object. Defaults to
"%Y/%m/%d".
- reset_child_datetime_directory: bool
Whether to remove the (child) directory first if it already exists. Defaults to
False.This might save us from issues regarding files being overwritten and corrupted.
- get_datetime_directory(datetime_object: datetime) Path[source]
Get the full path to the datetime directory (given the datetime object). This does not create the directory.
- Parameters:
datetime_object – The datetime object for which the full directory path will be returned.
- Returns:
The full path of the datetime directory.
Example
>>> path = DateTimeDirectory( ... datetime_format_string="%Y/%m/%d", ... parent_output_directory_path=Path.home() ... ).get_datetime_directory( ... datetime(2022, 3, 12) ... ) >>> expected_path = Path.home() / Path("2022/03/12") >>> expected_path == path True
- create_datetime_directory(datetime_object: datetime) Path[source]
Create a directory based on the datetime object.
- Parameters:
datetime_object – The datetime object to create the directory for.
- Returns:
The full path of the (created) directory.
Example
>>> path = DateTimeDirectory( ... datetime_format_string="%Y/%m/%d", ... parent_output_directory_path=Path.home() ... ).create_datetime_directory( ... datetime(2022, 3, 12) ... ) >>> expected_path = Path.home() / Path("2022/03/12") >>> expected_path.exists() True >>> expected_path == path True