Tasks
The following are specifications and examples of different keys and values in task.yaml files, depending on
different scenarios. The YAML files are parsed and first validated by Pydantic and any errors are logged.
Every task is a YAML document which consists of the following parts, or the so-called nodes:
context: literal stringaction: literal stringspecifications: mapping
The context determines the general category of the task. For example, context: ids means we are dealing
with a task related to product IDs. The action is a verb which describes what needs to be done. For example in the
context of ids, action: fetch means product IDs need to be fetched from the EUMETSAT Datastore. Finally,
specifications is a dictionary which gives details of the action, such as start_datetime.
Note
It is possible to have several tasks in the same file. Since each task is a YAML document, tasks can be separated by
three dashes ---.
The following is the content of a task file which obtains all product IDs for SEVIRI native data between 2015/06/01
(inclusive) and 2015/08/01 (exclusive) and saves them in seviri_product_ids.txt
context: ids
action: fetch
specifications:
start_datetime: "2015-06-01T00:00:00+00:00"
end_datetime: "2015-08-01T00:00:00+00:00"
batch_interval:
days: 30
output_filepath: seviri_product_ids.txt
Specifications depend on the context and the action as explained below.
Examples
Product IDs
Fetch SEVIRI product IDs
context: ids
action: fetch
specifications:
batch_interval: <required>
end_datetime: <required>
output_filepath: <required>
start_datetime: <required>
Files
Download and resample SEVIRI native files
context: files
action: fetch
specifications:
area: <required>
end_datetime: <required>
input_filepath: <required>
parent_output_directory_path: <required>
start_datetime: <required>
channel_names: <optional>
dataset_save_options: <optional>
fsspec_cache: <optional>
number_of_processes: <optional>
output_filename_generator: <optional>
radius_of_influence: <optional>
remove_file_if_exists: <optional>
temp_directory_path: <optional>
Verify that files exist according to the expected product IDs, given datetime range, and the patterns
context: files
action: verify
specifications:
end_datetime: <required>
nominal_file_size: <required>
parent_input_directory_path: <required>
reference: <required>
start_datetime: <required>
file_size_relative_tolerance: <optional>
number_of_processes: <optional>
pattern: <optional>
verbose: <optional>
CHIMP Retrieval
Process SEVIRI files with CHIMP model to retrieve cloud parameters
context: chimp
action: retrieve
specifications:
end_datetime: <required>
model_filepath: <required>
parent_input_directory_path: <required>
parent_output_directory_path: <required>
start_datetime: <required>
device: <optional>
sequence_length: <optional>
temporal_overlap: <optional>
temp_directory_path: <optional>
tile_size: <optional>
verbose: <optional>
Specifications
Datetime instances
Keys:
start_datetime
end_datetime
Description:
String indicating the datetime in ISO format with timezone information.
Python type:
datetime.datetime
Supported in:
ids:
fetch
files:
fetch
verify
chimp:
retrieve
Example
start_datetime: "2022-08-12T00:00:00+00:00"
Datetime intervals
Keys:
batch_interval
Description:
Dictionary indicating the interval for each batch when fetching product IDs.
Values:
dictionary:
keys:
weeks
days
hours
minutes
seconds
values:
non-negative integers
Python type:
datetime.timedelta
Supported in:
ids:
fetch
Example:
batch_interval:
days: 30
hours: 10
Paths
Keys:
input_filepath # must point to an existing file
output_filepath # must be a new path as overwriting an existing file is not allowed!
reference # must point to an existing file containing reference product IDs
parent_input_directory_path # must point to an existing directory
parent_output_directory_path # must point to an existing directory
temp_directory_path # must point to an existing directory for temporary files
model_filepath # must point to an existing model file
Description:
A string which can be interpreted as a valid path. It can point to either relative or absolute paths.
Internally, it will be parsed into an absolute path for consistency.
Python type:
pathlib.Path
Supported in:
ids:
fetch:
output_filepath
files:
fetch:
input_filepath
parent_output_directory_path
temp_directory_path
verify:
reference
parent_input_directory_path
chimp:
retrieve:
model_filepath
parent_output_directory_path
parent_input_directory_path
temp_directory_path
Example
input_filepath: products_ids.txt
parent_output_directory_path: /path/to/output/directory
reference: /path/to/product_ids.txt
Pattern
Keys:
pattern
Description:
A dictionary containing the pattern to filter filenames.
Values:
sub_strings:
A single literal string or a list of literal strings used to filter filenames. This is optional,
and if absent from the task file, no filtering will be performed on the filenames.
The pattern does not support wildcards or regex, only literals.
case_sensitive:
A boolean indicating whether the pattern should be case-sensitive or not.
match_all:
A boolean indicating whether all or any of the sub_strings should be present in the filename.
Default:
pattern: null
case_sensitive: true
match_all: true
Python type:
sub_strings: str | list[str] | None
case_sensitive: bool
match_all: bool
Supported in:
files:
verify
pattern:
sub_strings:
- "seviri"
- "nc"
case_sensitive: false
match_all: false
Numbers
Keys:
number_of_processes
nominal_file_size # in bytes
file_size_relative_tolerance
radius_of_influence
sequence_length
temporal_overlap
tile_size
Description:
Positive integers for counts and sizes, float values for tolerance parameters.
Default:
number_of_processes: 1
file_size_relative_tolerance: 0.01
radius_of_influence: 20000
sequence_length: 16
temporal_overlap: 0
tile_size: 256
Python type:
int | float
Supported in:
files:
fetch:
number_of_processes
radius_of_influence
verify:
number_of_processes
nominal_file_size
file_size_relative_tolerance
chimp:
retrieve:
sequence_length
temporal_overlap
tile_size
Device Selection
Keys:
device
Description:
String indicating computation device to use.
Values:
Literal["cpu", "cuda"]
Default:
"cpu"
Python type:
str
Supported in:
chimp:
retrieve
Verbose Options
Keys:
verbose
Description:
Boolean indicating whether to enable verbose output
or a list of string options specifying which verbose information to show.
Values:
true | false | list[Literal["files", "reference", "corrupted", "missing"]]
Default:
false
Python type:
bool | list[Literal["files", "reference", "corrupted", "missing"]]
Supported in:
files:
verify
chimp:
retrieve
Examples
verbose: true
or
verbose:
- corrupted
- missing
Function References
Keys:
output_filename_generator
Description:
String containing the fully qualified function path of Monkey Wrench excluding the leading `monkey_wrench.`
Default:
input_output.seviri.input_filename_from_product_id
Python type:
str
Supported in:
files:
fetch
Cache Options
Keys:
fsspec_cache
Description:
String indicating the cache type to use with fsspec.
Values:
Literal["filecache", "blockcache"] | null
Default:
null
Python type:
str | None
Supported in:
files:
fetch
Example
fsspec_cache: filecache
Channel Configuration
Keys:
channel_names
Description:
List of strings representing channel names to process.
Default:
- "VIS006"
- "VIS008"
- "IR_016"
- "IR_039"
- "WV_062"
- "WV_073"
- "IR_087"
- "IR_097"
- "IR_108"
- "IR_120"
- "IR_134"
- "HRV"
Python type:
list[str]
Supported in:
files:
fetch
Dataset Save Options
Keys:
dataset_save_options
Description:
Dictionary with configuration options for saving datasets.
Default:
writer: "cf" # "cf" for "netcdf", i.e. the format for writing the data
include_lonlats: false
Python type:
dict
Supported in:
files:
fetch
Area Definitions
Keys:
area
Description:
Dictionary with area definitions conforming to the `Pyresample` package.
Python type:
dict
Supported in:
files:
fetch
Example
area:
CHIMP_NORDIC_4:
description: "CHIMP region of interest over the nordic countries"
projection:
proj: "stere"
lat_0: 90
lat_ts: 60
lon_0: 14
x_0: 0
y_0: 0
datum: "WGS84"
no_defs: null
type: "crs"
shape:
height: 564
width: 452
area_extent:
lower_left_xy: [-745322.8983833211, -3996217.269197446]
upper_right_xy: [1062901.0232376591, -1747948.2287755085]
units: "m"
Remove File Options
Keys:
remove_file_if_exists
Description:
Boolean indicating whether to remove existing files before processing.
Default:
true
Python type:
bool
Supported in:
files:
fetch