monkey_wrench.query package
The package providing all utilities for querying items.
- class monkey_wrench.query.CollectionMeta(*, query_string: str, snapshot_minutes: list[Annotated[int, Ge(ge=0), FieldInfo(annotation=NoneType, required=True, metadata=[Lt(lt=60)])]] | None = None)[source]
Bases:
ModelNamed tuple to gather the collection metadata.
- query_string: str
A colon (
:) delimited string which represents the query string for the collection on the EUMETSAT API.Example
For SEVIRI we have:
"EO:EUM:DAT:MSG:HRSEVIRI".
- snapshot_minutes: list[Annotated[int, Ge(ge=0), FieldInfo(annotation=NoneType, required=True, metadata=[Lt(lt=60)])]] | None
The minutes for which we have data in an hour.
Warning
For collections that this does not apply, set the default value, i.e.
None.Example
For SEVIRI we have one snapshot per
15minutes, starting from the 12th minute. As a result, we have[12, 27, 42, 57]for SEVIRI snapshots in an hour.
- class monkey_wrench.query.EumetsatAPI[source]
Bases:
objectStatic class for EUMETSAT API functionalities.
- api_base_url = HttpUrl('https://api.eumetsat.int/')
The root URL of the EUMETSAT API.
- credentials_env_vars: ClassVar[dict[str, str]] = {'login': 'EUMETSAT_API_LOGIN', 'password': 'EUMETSAT_API_PASSWORD'}
The keys of environment variables used to authenticate the EUMETSAT API calls.
Example
On Linux, you can use the
exportcommand to set the credentials in a terminal,export EUMETSAT_API_LOGIN=<login>; export EUMETSAT_API_PASSWORD=<password>;
- download_path_template = '{base}/data/download/1.0.0/collections/{collection}/products'
The template URL for the downloading collections.
- classmethod get_token() AccessToken[source]
Get a token using the
credentials_env_vars.This method returns the same token if it is still valid and issues a new one otherwise.
- Returns:
A token using which the datastore can be accessed.
Note
See API key management on the eumdac website for more information.
- static make_collection_url(collection: EumetsatCollection) HttpUrl[source]
Make a complete collection URL from the API base URL and the given collection (query string).
- Parameters:
collection – A collection of type
EumetsatCollection, e.g. for the SEVIRI we haveEumetsatCollection.seviri.- Returns:
The full collection URL using which the files can be fetched.
Example
>>> EumetsatAPI.make_collection_url(EumetsatCollection.seviri) HttpUrl('https://api.eumetsat.int/data/download/1.0.0/collections/EO%3AEUM%3ADAT%3AMSG%3AHRSEVIRI/products')
- class monkey_wrench.query.EumetsatCollection(*values)[source]
Bases:
EnumEnum class that defines the collections for the EUMETSAT datastore.
- amsu = CollectionMeta(query_string='EO:EUM:DAT:METOP:AMSUL1', snapshot_minutes=None)
- avhrr = CollectionMeta(query_string='EO:EUM:DAT:METOP:AVHRRL1', snapshot_minutes=None)
- mhs = CollectionMeta(query_string='EO:EUM:DAT:METOP:MHSL1', snapshot_minutes=None)
- seviri = CollectionMeta(query_string='EO:EUM:DAT:MSG:HRSEVIRI', snapshot_minutes=[12, 27, 42, 57])
- class monkey_wrench.query.EumetsatQuery(collection: EumetsatCollection = EumetsatCollection.seviri, log_context: str = 'EUMETSAT Query')[source]
Bases:
Query- __init__(collection: EumetsatCollection = EumetsatCollection.seviri, log_context: str = 'EUMETSAT Query') None[source]
Initialize an instance of the class with API credentials read from the environment variables.
This constructor method sets up a private eumdac datastore by obtaining an authentication token using the provided API
loginandpasswordwhich are read from the environment variables.- Parameters:
collection – The collection, defaults to
sevirifor SEVIRI.log_context – A string that will be used in log messages to determine the context. Defaults to an empty string.
- fetch_product(product: Product, chain: Chain, output_directory: Path, sleep_time: Annotated[int, Gt(gt=0)]) Path | None[source]
Fetch the file for a single product and write the product file to disk.
- Parameters:
product – The Product whose corresponding file will be fetched.
chain – Chain to apply for customization of the output file.
output_directory – The directory to save the file in.
sleep_time – Sleep time, in seconds, between requests.
- Returns:
The path of the saved file on the disk, Otherwise
Nonein case of a failure.
- fetch_products(search_results: SearchResults, output_directory: Path, bounding_box: BoundingBox | None = None, output_file_format: str = 'netcdf4', sleep_time: Annotated[int, Gt(gt=0)] = 10) list[Path | None][source]
Fetch all products from search results and write product files to disk.
- Parameters:
search_results – Search results for which the files will be fetched.
output_directory – The directory to save the files in.
bounding_box – Bounding box, i.e. (north, south, west, east) limits. Defaults to
Nonewhich meansBoundingBox(90., -90, -180., 180)will be used.output_file_format – Desired format of the output file(s). Defaults to
netcdf4.sleep_time – Sleep time, in seconds, between requests. Defaults to
10seconds.
- Returns:
A list paths for the fetched files.
- query(datetime_period: DateTimePeriodStrict, polygon: Polygon | None = None) SearchResults[source]
Query product IDs in a single batch.
This method wraps around the
eumdac.Collection().search()method to perform a search for product IDs within a specified time range and the polygon.Note
For a given SEVIRI collection, an example product ID is
"MSG3-SEVI-MSG15-0100-NA-20150731221240.036000000Z-NA".Note
start_timeandend_timeare treated respectively as inclusive and exclusive when querying the IDs. For example, to obtain all the data up to and including2022/12/31, we must setend_time=datetime(2023, 1, 1).- Parameters:
datetime_period – The datetime period to query for.
polygon – An object of type
Polygon.
- Returns:
The results of the search, containing the product IDs found within the specified period and the polygon.
- Raises:
ValueError – Refer to
assert_start_time_is_before_end_time().
- query_in_batches(datetime_range_in_batches: DateTimeRangeInBatches) Generator[tuple[SearchResults, int], None, None][source]
Retrieve all the product IDs, given a time range and a batch interval, fetching one batch at a time.
- Parameters:
datetime_range_in_batches – The datetime range to query for.
Note
As an example, for SEVIRI, we expect to have one file (product ID) per
15minutes, i.e.4files per hour or96files per day. If our re-analysis period is2022/01/01(inclusive) to2023/01/01(exclusive), i.e.365days. This results in a maximum of35040files.If we split our datetime range into intervals of
30days and fetch product IDs in batches, there is a maximum of2880 = 96 x 30IDs in each batch retrieved by a single request. One might need to adapt this value to avoid running into the issue of sending too many requests to the server.- Yields:
A generator of 2-tuples. The first element of each tuple is the collection of products retrieved in that batch. The second element is the number of the retrieved products for that batch. The search results can be in turn iterated over to retrieve individual products.
Example
>>> from datetime import datetime, timedelta, UTC >>> >>> range_in_batches = DateTimeRangeInBatches( ... start_datetime=datetime(2022, 1, 1, tzinfo=UTC), ... end_datetime=datetime(2022, 1, 3, tzinfo=UTC), ... batch_interval=timedelta(days=1) ... ) >>> >>> try: ... api = EumetsatQuery() ... for batch, retrieved_count in api.query_in_batches(range_in_batches): ... assert retrieved_count == batch.total_results ... for product in batch: ... pass ... except KeyError as e: # If the API credentials are not set! ... assert "environment variable" in str(e)
- class monkey_wrench.query.List(items: list, datetime_parser: SeviriIDParser | FilePathParser, log_context: str = 'List')[source]
Bases:
QueryA class to provide generic functionalities to query lists.
Note
This class is meant to behave as an immutable list.
Note
This class utilizes
numpy.ndarrayobjects under the hood.- __get_indices(datetime_period: DateTimePeriod) array
Similar to
query_indices(), but returns the numpy indices instead.
- __init__(items: list, datetime_parser: SeviriIDParser | FilePathParser, log_context: str = 'List') None[source]
Make an instance of the class.
- Parameters:
items – The complete list of items to query.
datetime_parser – A class of type
DateTimeParserto enable parsing items into datetime objects.log_context – A string that will be used in log messages to determine the context. Defaults to an empty string.
- generate_k_sized_batches_by_index(k: Annotated[int, Gt(gt=0)], index_start: int = 0, index_end: int = -1, batches_as_python_lists: bool = True) Generator[source]
Generate batches (sub-lists) of size
kand move forward by1index each time.A batch consists of the item at the current index, as well as
k-1preceding items. In other words, a batch includeskadjacent items, with the item at the current index being the last item of the batch. Next batch is retrieved by incrementing the current index by+1. As a result, two consecutive batches havek-2common objects.Note
Both
index_startandindex_endare considered as inclusive. They can be negative as well.Note
The indices are zero-based. If
index_startis less than or equal tok-1, the first batch includes items from index0to indexk-1(inclusive). The next batch includes indices[1, k].- Parameters:
k – The size of the batches. Each batch includes the current item as well as
k-1preceding items.index_start – The zero-based index of the first item to start generating the batches from. Defaults to
0and can be negative as well.index_end – The zero-based index of the last item (inclusive) up to which the batches are generated. Defaults to
-1meaning the last item of the list makes the last item of the final batch.batches_as_python_lists – A boolean determining whether to return each batch as a Python built-in list or as
Listobjects. Defaults toTrue.
- Yields:
A generator that yields batches of size
k. Adjacent batches overlap byk-2items.- Raises:
ValueError – If
index_startis greater thanindex_end.ValueError – If
kexceeds the size of the list.IndexError – If normalized indices exceed the size of the List object. Refer to
normalize_index().
- normalize_index(index: int) int[source]
Convert a negative index into its positive equivalent, or return the original index if it is non-negative.
- Raises:
IndexError – If the positive index or its positive equivalent exceeds the size of the
Listobject.
- property parsed_items: ndarray
- partition_in_k_sized_batches_by_index(k: Annotated[int, Gt(gt=0)], index_start: int = 0, index_end: int = -1, batches_as_python_lists: bool = True) Generator[source]
Partition the list, where the batches are of size
kor less.Note
The partition is given for all items that are in
[index_start, index_end](both inclusive).Note
This is similar to
generate_k_sized_batches_by_index(), but there are differences. First, this method generates partitions, i.e. sub-lists do not have any common items. Second, there could be one sub-list whose size is less thank. This happens when the length of available items to partition is less thank.
- query(datetime_period: DateTimePeriod) Self[source]
Query items from the
Listobject, given a start datetime and an end datetime.- Parameters:
datetime_period – The datetime period to query the items from.
- Returns:
A new
Listobject including items that match the given query.- Raises:
ValueError – Refer to
assert_start_time_is_before_end_time().
- class monkey_wrench.query.LogMixin(log_context: str = '')[source]
Bases:
object- __init__(log_context: str = '') None[source]
Make an instance of the class.
- Parameters:
log_context – A string that will be used in log messages to determine the context. Defaults to an empty string.
- property log_context: str
Get the log context as a string.
- class monkey_wrench.query.Query(*args, **kwargs)[source]
Bases:
ABC,LogMixinAbstract base class for queries.
- abstractmethod static len(items: Any) Annotated[int, Ge(ge=0)][source]
Get the size (number) of items, e.g. the Python built-in
len()function in case of a list.
- query_in_batches(datetime_range_in_batches: DateTimeRangeInBatches, expected_total_count: Annotated[int, Ge(ge=0)] | None = None) Generator[tuple[T, int], None, None][source]
Divide the specified time range into smaller intervals (batches) and perform queries on them.
The arguments are the same as
DatetimeRangeInBatches. Ifexpected_total_countis given, it will be compared withtotal_retrieved_countand if they are not equal, a warning will be logged. It defaults toNonewhich means no comparison is made.- Yields:
The result of queries, in the form of 2-tuples in which the first element is the retrieved items from the
query()function in each batch and the second element is the size of the items in the batch.