xyzpy.manage
#
Manage datasets — loading, saving, merging etc.
Module Contents#
Functions#
|
Cache this function to disk, using joblib. |
|
Make sure a file name has an extension that reflects its |
|
Saves a xarray dataset. |
|
Loads a xarray dataset. Basically |
|
Save dataset |
|
Drop values across all dimensions for which all values are NaN. |
|
Make sure all the dimensions on all the variables appear in the |
|
Add |
|
Print out information about the range and any missing values for an |
|
Automatically turn an array into a xarray dataset. Transpose |
|
Glob files based on base_name, merge them, save this new dataset if |
|
Save a dataframe to disk. |
|
Load a dataframe from disk. |
Attributes#
- xyzpy.manage._DEFAULT_FN_CACHE_PATH = '__xyz_cache__'#
- xyzpy.manage.cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs)[source]#
Cache this function to disk, using joblib.
- xyzpy.manage._engine_extensions#
- xyzpy.manage.auto_add_extension(file_name, engine)[source]#
Make sure a file name has an extension that reflects its file type.
- xyzpy.manage.save_ds(ds, file_name, engine='h5netcdf', **kwargs)[source]#
Saves a xarray dataset.
- Parameters:
ds (xarray.Dataset) – The dataset to save.
file_name (str) – Name of the file to save to.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to save file with.
- Return type:
None
- xyzpy.manage.load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)[source]#
Loads a xarray dataset. Basically
xarray.open_dataset
with some different defaults and convenient behaviour.- Parameters:
file_name (str) – Name of file to open.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to load file.
load_to_mem (bool, optional) – Ince opened, load from disk into memory. Defaults to
True
ifchunks=None
.create_new (bool, optional) – If no file exists make a blank one.
chunks (int or dict) – Passed to
xarray.open_dataset
so that data is stored usingdask.array
.
- Returns:
ds – Loaded Dataset.
- Return type:
- xyzpy.manage.save_merge_ds(ds, fname, overwrite=None, **kwargs)[source]#
Save dataset
ds
, but check for an existing dataset with that name first, and if it exists, merge the two before saving.- Parameters:
ds (xarray.Dataset) – The dataset to save.
fname (str) – The file name.
overwrite ({None, False, True}, optional) –
How to merge the dataset with the existing dataset.
None: the datasets will be merged in there are no conflicts
False: data will be taken from old dataset if conflicting
True: data will be taken from new dataset if conflicting
- xyzpy.manage.sort_dims(ds)[source]#
Make sure all the dimensions on all the variables appear in the same order.
- xyzpy.manage.check_runs(obj, dim='run', var=None, sel=())[source]#
Print out information about the range and any missing values for an integer dimension.
- xyzpy.manage.auto_xyz_ds(x, y_z=None)[source]#
Automatically turn an array into a xarray dataset. Transpose
y_z
if necessary to automatically match dimension sizes.- Parameters:
x (array_like) – The x-coordinates.
y_z (array_like, optional) – The y-data, possibly varying with coordinate z.
- xyzpy.manage.merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)[source]#
Glob files based on base_name, merge them, save this new dataset if it contains new info, then clean up the conflicts.