analysis-ready hierarchical model output
Analysis scripts are forced to load way too much data.
optimize output for analysis
(not write throughput)
Grid | Cells |
---|---|
1° by 1° | 0.06M |
10 km | 5.1M |
5 km | 20M |
1 km | 510M |
200 m | 12750M |
Screen | Pixels |
---|---|
VGA | 0.3M |
Full HD | 2.1M |
MacBook 13’ | 4.1M |
4K | 8.8M |
8K | 35.4M |
It’s impossible to look at the entire globe in full resolution.
should work well below km-scale
(for now)
yac = YAC(xml, xsd)
sgd = HealPixSubgridDefinition(2**10, nchunks, ichunk)
dataset = zarr.open_consolidated(output_folder, mode="r+")
comp_id = yac.def_comp("healpix_io")
point_id, grid = make_yac_grid(sgd)
fields = {varname: Field.create(varname, comp_id, point_id)
for varname in varnames}
put_fields = ...
yac.search()
steps = compute_nsteps(yac, fields)
for i in range(steps):
for varname, field in fields.items():
buffer = field.get()
put_fields[varname].put(coarsen(buffer))
dataset[varname][i,sgd.cell_chunk_slice] = buffer
(real code at GWDG gitlab)
It’s about 1.6 PiB.
(if it wouldn’t be compressed)
This code selects ICON model output at all dropsonde locations during EUREC4A.
sonde_pix = healpy.ang2pix(
icon.crs.healpix_nside, joanne.flight_lon, joanne.flight_lat, lonlat=True, nest=True
)
icon_sondes = (
icon[["ua", "va", "ta", "hus"]]
.sel(time=joanne.launch_time, method="nearest")
.isel(cell=sonde_pix)
.compute()
)
(55 sec, 1GB, single thread, full code at easy.gems)
(code on GWDG gitlab)
ZEP0002 proposes a standard way to pack multiple Zarr-Chunks in one Object.
ZEP0005 proposes a standard way to represent aggregated hierarchies.
natESM Training, 15 Nov 2023