`xyzpy.manage`#

Manage datasets — loading, saving, merging etc.

Module Contents#

Functions#

`cache_to_disk`([fn, cachedir])	Cache this function to disk, using joblib.
`auto_add_extension`(file_name, engine)	Make sure a file name has an extension that reflects its
`save_ds`(ds, file_name[, engine])	Saves a xarray dataset.
`load_ds`(file_name[, engine, load_to_mem, create_new, ...])	Loads a xarray dataset. Basically `xarray.open_dataset` with some
`save_merge_ds`(ds, fname[, overwrite])	Save dataset `ds`, but check for an existing dataset with that name
`trimna`(obj)	Drop values across all dimensions for which all values are NaN.
`sort_dims`(ds)	Make sure all the dimensions on all the variables appear in the
`post_fix`(ds, postfix)	Add `postfix` to the name of every data variable in `ds`.
`check_runs`(obj[, dim, var, sel])	Print out information about the range and any missing values for an
`auto_xyz_ds`(x[, y_z])	Automatically turn an array into a xarray dataset. Transpose `y_z`
`merge_sync_conflict_datasets`(base_name[, engine, ...])	Glob files based on base_name, merge them, save this new dataset if
`save_df`(df, name[, engine, key])	Save a dataframe to disk.
`load_df`(name[, engine, key])	Load a dataframe from disk.

Attributes#

`_DEFAULT_FN_CACHE_PATH`
`_engine_extensions`

xyzpy.manage._DEFAULT_FN_CACHE_PATH = '__xyz_cache__'#

xyzpy.manage.cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs)[source]#: Cache this function to disk, using joblib.

xyzpy.manage._engine_extensions#

xyzpy.manage.auto_add_extension(file_name, engine)[source]#: Make sure a file name has an extension that reflects its file type.

xyzpy.manage.save_ds(ds, file_name, engine='h5netcdf', **kwargs)[source]#

Saves a xarray dataset.

Parameters:

ds (xarray.Dataset) – The dataset to save.
file_name (str) – Name of the file to save to.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to save file with.

Return type:

None

xyzpy.manage.load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)[source]#

Loads a xarray dataset. Basically xarray.open_dataset with some different defaults and convenient behaviour.

Parameters:

file_name (str) – Name of file to open.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to load file.
load_to_mem (bool, optional) – Ince opened, load from disk into memory. Defaults to True if chunks=None.
create_new (bool, optional) – If no file exists make a blank one.
chunks (int or dict) – Passed to xarray.open_dataset so that data is stored using dask.array.

Returns:

ds – Loaded Dataset.

Return type:

xarray.Dataset

xyzpy.manage.save_merge_ds(ds, fname, overwrite=None, **kwargs)[source]#

Save dataset ds, but check for an existing dataset with that name first, and if it exists, merge the two before saving.

Parameters:

ds (xarray.Dataset) – The dataset to save.
fname (str) – The file name.
overwrite ({None, False, True}, optional) –
How to merge the dataset with the existing dataset.
- None: the datasets will be merged in there are no conflicts
- False: data will be taken from old dataset if conflicting
- True: data will be taken from new dataset if conflicting

xyzpy.manage.trimna(obj)[source]#: Drop values across all dimensions for which all values are NaN.

xyzpy.manage.sort_dims(ds)[source]#: Make sure all the dimensions on all the variables appear in the same order.

xyzpy.manage.post_fix(ds, postfix)[source]#: Add postfix to the name of every data variable in ds.

xyzpy.manage.check_runs(obj, dim='run', var=None, sel=())[source]#

Print out information about the range and any missing values for an integer dimension.

Parameters:

obj (xarray object) – Data to check.
dim (str (optional)) – Dimension to check, defaults to ‘run’.
var (str (optional)) – Subselect this data variable first.
sel (mapping (optional)) – Subselect these other coordinates first.

xyzpy.manage.auto_xyz_ds(x, y_z=None)[source]#

Automatically turn an array into a xarray dataset. Transpose y_z if necessary to automatically match dimension sizes.

Parameters:

x (array_like) – The x-coordinates.
y_z (array_like, optional) – The y-data, possibly varying with coordinate z.

xyzpy.manage.merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)[source]#

Glob files based on base_name, merge them, save this new dataset if it contains new info, then clean up the conflicts.

Parameters:

base_name (str) – Base file name to glob on - should include ‘*’.
engine (str , optional) – Load and save engine used by xarray.

xyzpy.manage.save_df(df, name, engine='pickle', key='df', **kwargs)[source]#: Save a dataframe to disk.

xyzpy.manage.load_df(name, engine='pickle', key='df', **kwargs)[source]#: Load a dataframe from disk.

xyzpy.manage#

Module Contents#

Functions#

Attributes#

`xyzpy.manage`#