monkey_wrench.query._api module

class monkey_wrench.query._api.EumetsatQuery(collection: EumetsatCollection = EumetsatCollection.seviri, log_context: str = 'EUMETSAT Query')[source]

Bases: Query

__init__(collection: EumetsatCollection = EumetsatCollection.seviri, log_context: str = 'EUMETSAT Query') None[source]

Initialize an instance of the class with API credentials read from the environment variables.

This constructor method sets up a private eumdac datastore by obtaining an authentication token using the provided API login and password which are read from the environment variables.

Parameters:
  • collection – The collection, defaults to seviri for SEVIRI.

  • log_context – A string that will be used in log messages to determine the context. Defaults to an empty string.

static len(product_ids: SearchResults) int[source]

Return the number of product IDs.

query(datetime_period: DateTimePeriodStrict, polygon: Polygon | None = None) SearchResults[source]

Query product IDs in a single batch.

This method wraps around the eumdac.Collection().search() method to perform a search for product IDs within a specified time range and the polygon.

Note

For a given SEVIRI collection, an example product ID is "MSG3-SEVI-MSG15-0100-NA-20150731221240.036000000Z-NA".

Note

start_time and end_time are treated respectively as inclusive and exclusive when querying the IDs. For example, to obtain all the data up to and including 2022/12/31, we must set end_time=datetime(2023, 1, 1).

Parameters:
  • datetime_period – The datetime period to query for.

  • polygon – An object of type Polygon.

Returns:

The results of the search, containing the product IDs found within the specified period and the polygon.

Raises:

ValueError – Refer to assert_start_time_is_before_end_time().

query_in_batches(datetime_range_in_batches: DateTimeRangeInBatches) Generator[tuple[SearchResults, int], None, None][source]

Retrieve all the product IDs, given a time range and a batch interval, fetching one batch at a time.

Parameters:

datetime_range_in_batches – The datetime range to query for.

Note

As an example, for SEVIRI, we expect to have one file (product ID) per 15 minutes, i.e. 4 files per hour or 96 files per day. If our re-analysis period is 2022/01/01 (inclusive) to 2023/01/01 (exclusive), i.e. 365 days. This results in a maximum of 35040 files.

If we split our datetime range into intervals of 30 days and fetch product IDs in batches, there is a maximum of 2880 = 96 x 30 IDs in each batch retrieved by a single request. One might need to adapt this value to avoid running into the issue of sending too many requests to the server.

Yields:

A generator of 2-tuples. The first element of each tuple is the collection of products retrieved in that batch. The second element is the number of the retrieved products for that batch. The search results can be in turn iterated over to retrieve individual products.

Example

>>> from datetime import datetime, timedelta, UTC
>>>
>>> range_in_batches = DateTimeRangeInBatches(
...  start_datetime=datetime(2022, 1, 1, tzinfo=UTC),
...  end_datetime=datetime(2022, 1, 3, tzinfo=UTC),
...  batch_interval=timedelta(days=1)
... )
>>>
>>> try:
...  api = EumetsatQuery()
...  for batch, retrieved_count in api.query_in_batches(range_in_batches):
...     assert retrieved_count == batch.total_results
...     for product in batch:
...         pass
... except KeyError as e:  # If the API credentials are not set!
...  assert "environment variable" in str(e)
fetch_products(search_results: SearchResults, output_directory: Path, bounding_box: BoundingBox | None = None, output_file_format: str = 'netcdf4', sleep_time: Annotated[int, Gt(gt=0)] = 10) list[Path | None][source]

Fetch all products from search results and write product files to disk.

Parameters:
  • search_results – Search results for which the files will be fetched.

  • output_directory – The directory to save the files in.

  • bounding_box – Bounding box, i.e. (north, south, west, east) limits. Defaults to None which means BoundingBox(90., -90, -180., 180) will be used.

  • output_file_format – Desired format of the output file(s). Defaults to netcdf4.

  • sleep_time – Sleep time, in seconds, between requests. Defaults to 10 seconds.

Returns:

A list paths for the fetched files.

fetch_product(product: Product, chain: Chain, output_directory: Path, sleep_time: Annotated[int, Gt(gt=0)]) Path | None[source]

Fetch the file for a single product and write the product file to disk.

Parameters:
  • product – The Product whose corresponding file will be fetched.

  • chain – Chain to apply for customization of the output file.

  • output_directory – The directory to save the file in.

  • sleep_time – Sleep time, in seconds, between requests.

Returns:

The path of the saved file on the disk, Otherwise None in case of a failure.