xyzpy.manage¶

Manage datasets — loading, saving, merging etc.

Attributes¶

`_DEFAULT_FN_CACHE_PATH`
`_engine_extensions`

Functions¶

`cache_to_disk`([fn, cachedir])	Cache this function to disk, using joblib.
`auto_add_extension`(file_name, engine)	Ensure `file_name` has an extension matching `engine`.
`save_ds`(ds, file_name[, engine])	Saves a xarray dataset.
`load_ds`(file_name[, engine, load_to_mem, create_new, ...])	Loads a xarray dataset. Basically `xarray.open_dataset` with some
`save_merge_ds`(ds, fname[, overwrite])	Save dataset `ds`, but check for an existing dataset with that name
`trimna`(obj)	Drop values across dims where all values are NaN.
`sort_dims`(ds)	Reorder variable dimensions to match `ds.dims`. This is an inplace
`post_fix`(ds, postfix)	Append `"_{postfix}"` to each data variable name.
`check_runs`(obj[, dim, var, sel])	Print out information about the range and any missing values for an
`auto_xyz_ds`(x[, y_z])	Automatically turn an array into a xarray dataset. Transpose `y_z`
`merge_sync_conflict_datasets`(base_name[, engine, ...])	Glob files based on base_name, merge them, save this new dataset if
`save_df`(df, name[, engine, key])	Save a dataframe to disk.
`load_df`(name[, engine, key])	Load a dataframe from disk.

Module Contents¶

xyzpy.manage._DEFAULT_FN_CACHE_PATH = '__xyz_cache__'¶

xyzpy.manage.cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs)[source]¶: Cache this function to disk, using joblib.

xyzpy.manage._engine_extensions¶

xyzpy.manage.auto_add_extension(file_name, engine)[source]¶

Ensure file_name has an extension matching engine.

Parameters:

file_name (str) – File name to normalize.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}) – Engine determining the extension.

Returns:

File name with an appropriate extension appended.

Return type:

str

xyzpy.manage.save_ds(ds, file_name, engine='h5netcdf', **kwargs)[source]¶

Saves a xarray dataset.

Parameters:

ds (xarray.Dataset) – The dataset to save.
file_name (str) – Name of the file to save to.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to save file with.

Return type:

None

xyzpy.manage.load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)[source]¶

Loads a xarray dataset. Basically xarray.open_dataset with some different defaults and convenient behaviour.

Parameters:

file_name (str) – Name of file to open.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to load file.
load_to_mem (bool, optional) – Ince opened, load from disk into memory. Defaults to True if chunks=None.
create_new (bool, optional) – If no file exists make a blank one.
chunks (int or dict) – Passed to xarray.open_dataset so that data is stored using dask.array.

Returns:

ds – Loaded Dataset.

Return type:

xarray.Dataset

xyzpy.manage.save_merge_ds(ds, fname, overwrite=None, **kwargs)[source]¶

Save dataset ds, but check for an existing dataset with that name first, and if it exists, merge the two before saving.

Parameters:

ds (xarray.Dataset) – The dataset to save.
fname (str) – The file name.
overwrite ({None, False, True}, optional) –
How to merge the dataset with the existing dataset.
- None: the datasets will be merged in there are no conflicts
- False: data will be taken from old dataset if conflicting
- True: data will be taken from new dataset if conflicting

xyzpy.manage.trimna(obj)[source]¶

Drop values across dims where all values are NaN.

Parameters:: obj (xarray.Dataset or xarray.DataArray) – Object to trim.
Returns:: Trimmed object.
Return type:: same type as obj

xyzpy.manage.sort_dims(ds)[source]¶

Reorder variable dimensions to match ds.dims. This is an inplace operation.

Parameters:: ds (xarray.Dataset) – Dataset to reorder in place.
Return type:: None

xyzpy.manage.post_fix(ds, postfix)[source]¶

Append "_{postfix}" to each data variable name.

Parameters:

ds (xarray.Dataset) – Dataset to rename.
postfix (str) – Suffix to append.

Returns:

Renamed dataset.

Return type:

xarray.Dataset

xyzpy.manage.check_runs(obj, dim='run', var=None, sel=())[source]¶

Print out information about the range and any missing values for an integer dimension.

Parameters:

obj (xarray object) – Data to check.
dim (str (optional)) – Dimension to check, defaults to ‘run’.
var (str (optional)) – Subselect this data variable first.
sel (mapping (optional)) – Subselect these other coordinates first.

xyzpy.manage.auto_xyz_ds(x, y_z=None)[source]¶

Automatically turn an array into a xarray dataset. Transpose y_z if necessary to automatically match dimension sizes.

Parameters:

x (array_like) – The x-coordinates.
y_z (array_like, optional) – The y-data, possibly varying with coordinate z.

xyzpy.manage.merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)[source]¶

Glob files based on base_name, merge them, save this new dataset if it contains new info, then clean up the conflicts.

Parameters:

base_name (str) – Base file name to glob on - should include ‘*’.
engine (str , optional) – Load and save engine used by xarray.
combine_first (bool, optional) – If True, combine datasets sequentially using combine_first, preferring the first dataset in the list, which is assumed to be the original. If False, merge all datasets together using xr.merge, which will raise an error if there are any conflicts.

xyzpy.manage.save_df(df, name, engine='pickle', key='df', **kwargs)[source]¶

Save a dataframe to disk.

Parameters:

df (pandas.DataFrame) – DataFrame to save.
name (str) – File name to save to.
engine ({'pickle', 'csv', 'hdf'}, optional) – Storage backend.
key (str, optional) – HDF key when engine='hdf'.
**kwargs – Passed through to the pandas writer.

xyzpy.manage.load_df(name, engine='pickle', key='df', **kwargs)[source]¶

Load a dataframe from disk.

Parameters:

name (str) – File name to read from.
engine ({'pickle', 'csv', 'hdf'}, optional) – Storage backend.
key (str, optional) – HDF key when engine='hdf'.
**kwargs – Passed through to the pandas reader.

Returns:

Loaded dataframe.

Return type:

pandas.DataFrame