xyzpy.manage#

Manage datasets — loading, saving, merging etc.

Module Contents#

Functions#

cache_to_disk([fn, cachedir])

Cache this function to disk, using joblib.

auto_add_extension(file_name, engine)

Make sure a file name has an extension that reflects its

save_ds(ds, file_name[, engine])

Saves a xarray dataset.

load_ds(file_name[, engine, load_to_mem, create_new, ...])

Loads a xarray dataset. Basically xarray.open_dataset with some

save_merge_ds(ds, fname[, overwrite])

Save dataset ds, but check for an existing dataset with that name

trimna(obj)

Drop values across all dimensions for which all values are NaN.

sort_dims(ds)

Make sure all the dimensions on all the variables appear in the

post_fix(ds, postfix)

Add postfix to the name of every data variable in ds.

check_runs(obj[, dim, var, sel])

Print out information about the range and any missing values for an

auto_xyz_ds(x[, y_z])

Automatically turn an array into a xarray dataset. Transpose y_z

merge_sync_conflict_datasets(base_name[, engine, ...])

Glob files based on base_name, merge them, save this new dataset if

save_df(df, name[, engine, key])

Save a dataframe to disk.

load_df(name[, engine, key])

Load a dataframe from disk.

Attributes#

xyzpy.manage._DEFAULT_FN_CACHE_PATH = '__xyz_cache__'#
xyzpy.manage.cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs)[source]#

Cache this function to disk, using joblib.

xyzpy.manage._engine_extensions#
xyzpy.manage.auto_add_extension(file_name, engine)[source]#

Make sure a file name has an extension that reflects its file type.

xyzpy.manage.save_ds(ds, file_name, engine='h5netcdf', **kwargs)[source]#

Saves a xarray dataset.

Parameters:
  • ds (xarray.Dataset) – The dataset to save.

  • file_name (str) – Name of the file to save to.

  • engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to save file with.

Return type:

None

xyzpy.manage.load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)[source]#

Loads a xarray dataset. Basically xarray.open_dataset with some different defaults and convenient behaviour.

Parameters:
  • file_name (str) – Name of file to open.

  • engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to load file.

  • load_to_mem (bool, optional) – Ince opened, load from disk into memory. Defaults to True if chunks=None.

  • create_new (bool, optional) – If no file exists make a blank one.

  • chunks (int or dict) – Passed to xarray.open_dataset so that data is stored using dask.array.

Returns:

ds – Loaded Dataset.

Return type:

xarray.Dataset

xyzpy.manage.save_merge_ds(ds, fname, overwrite=None, **kwargs)[source]#

Save dataset ds, but check for an existing dataset with that name first, and if it exists, merge the two before saving.

Parameters:
  • ds (xarray.Dataset) – The dataset to save.

  • fname (str) – The file name.

  • overwrite ({None, False, True}, optional) –

    How to merge the dataset with the existing dataset.

    • None: the datasets will be merged in there are no conflicts

    • False: data will be taken from old dataset if conflicting

    • True: data will be taken from new dataset if conflicting

xyzpy.manage.trimna(obj)[source]#

Drop values across all dimensions for which all values are NaN.

xyzpy.manage.sort_dims(ds)[source]#

Make sure all the dimensions on all the variables appear in the same order.

xyzpy.manage.post_fix(ds, postfix)[source]#

Add postfix to the name of every data variable in ds.

xyzpy.manage.check_runs(obj, dim='run', var=None, sel=())[source]#

Print out information about the range and any missing values for an integer dimension.

Parameters:
  • obj (xarray object) – Data to check.

  • dim (str (optional)) – Dimension to check, defaults to ‘run’.

  • var (str (optional)) – Subselect this data variable first.

  • sel (mapping (optional)) – Subselect these other coordinates first.

xyzpy.manage.auto_xyz_ds(x, y_z=None)[source]#

Automatically turn an array into a xarray dataset. Transpose y_z if necessary to automatically match dimension sizes.

Parameters:
  • x (array_like) – The x-coordinates.

  • y_z (array_like, optional) – The y-data, possibly varying with coordinate z.

xyzpy.manage.merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)[source]#

Glob files based on base_name, merge them, save this new dataset if it contains new info, then clean up the conflicts.

Parameters:
  • base_name (str) – Base file name to glob on - should include ‘*’.

  • engine (str , optional) – Load and save engine used by xarray.

xyzpy.manage.save_df(df, name, engine='pickle', key='df', **kwargs)[source]#

Save a dataframe to disk.

xyzpy.manage.load_df(name, engine='pickle', key='df', **kwargs)[source]#

Load a dataframe from disk.