xyzpy.manage¶
Manage datasets — loading, saving, merging etc.
Attributes¶
Functions¶
|
Cache this function to disk, using joblib. |
|
Ensure |
|
Saves a xarray dataset. |
|
Loads a xarray dataset. Basically |
|
Save dataset |
|
Drop values across dims where all values are NaN. |
|
Reorder variable dimensions to match |
|
Append |
|
Print out information about the range and any missing values for an |
|
Automatically turn an array into a xarray dataset. Transpose |
|
Glob files based on base_name, merge them, save this new dataset if |
|
Save a dataframe to disk. |
|
Load a dataframe from disk. |
Module Contents¶
- xyzpy.manage._DEFAULT_FN_CACHE_PATH = '__xyz_cache__'¶
- xyzpy.manage.cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs)[source]¶
Cache this function to disk, using joblib.
- xyzpy.manage._engine_extensions¶
- xyzpy.manage.auto_add_extension(file_name, engine)[source]¶
Ensure
file_namehas an extension matchingengine.
- xyzpy.manage.save_ds(ds, file_name, engine='h5netcdf', **kwargs)[source]¶
Saves a xarray dataset.
- Parameters:
ds (xarray.Dataset) – The dataset to save.
file_name (str) – Name of the file to save to.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to save file with.
- Return type:
None
- xyzpy.manage.load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)[source]¶
Loads a xarray dataset. Basically
xarray.open_datasetwith some different defaults and convenient behaviour.- Parameters:
file_name (str) – Name of file to open.
engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to load file.
load_to_mem (bool, optional) – Ince opened, load from disk into memory. Defaults to
Trueifchunks=None.create_new (bool, optional) – If no file exists make a blank one.
chunks (int or dict) – Passed to
xarray.open_datasetso that data is stored usingdask.array.
- Returns:
ds – Loaded Dataset.
- Return type:
- xyzpy.manage.save_merge_ds(ds, fname, overwrite=None, **kwargs)[source]¶
Save dataset
ds, but check for an existing dataset with that name first, and if it exists, merge the two before saving.- Parameters:
ds (xarray.Dataset) – The dataset to save.
fname (str) – The file name.
overwrite ({None, False, True}, optional) –
How to merge the dataset with the existing dataset.
None: the datasets will be merged in there are no conflicts
False: data will be taken from old dataset if conflicting
True: data will be taken from new dataset if conflicting
- xyzpy.manage.trimna(obj)[source]¶
Drop values across dims where all values are NaN.
- Parameters:
obj (xarray.Dataset or xarray.DataArray) – Object to trim.
- Returns:
Trimmed object.
- Return type:
same type as obj
- xyzpy.manage.sort_dims(ds)[source]¶
Reorder variable dimensions to match
ds.dims. This is an inplace operation.- Parameters:
ds (xarray.Dataset) – Dataset to reorder in place.
- Return type:
None
- xyzpy.manage.post_fix(ds, postfix)[source]¶
Append
"_{postfix}"to each data variable name.- Parameters:
ds (xarray.Dataset) – Dataset to rename.
postfix (str) – Suffix to append.
- Returns:
Renamed dataset.
- Return type:
- xyzpy.manage.check_runs(obj, dim='run', var=None, sel=())[source]¶
Print out information about the range and any missing values for an integer dimension.
- xyzpy.manage.auto_xyz_ds(x, y_z=None)[source]¶
Automatically turn an array into a xarray dataset. Transpose
y_zif necessary to automatically match dimension sizes.- Parameters:
x (array_like) – The x-coordinates.
y_z (array_like, optional) – The y-data, possibly varying with coordinate z.
- xyzpy.manage.merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)[source]¶
Glob files based on base_name, merge them, save this new dataset if it contains new info, then clean up the conflicts.
- Parameters:
base_name (str) – Base file name to glob on - should include ‘*’.
engine (str , optional) – Load and save engine used by xarray.
combine_first (bool, optional) – If True, combine datasets sequentially using
combine_first, preferring the first dataset in the list, which is assumed to be the original. If False, merge all datasets together usingxr.merge, which will raise an error if there are any conflicts.
- xyzpy.manage.save_df(df, name, engine='pickle', key='df', **kwargs)[source]¶
Save a dataframe to disk.
- Parameters:
df (pandas.DataFrame) – DataFrame to save.
name (str) – File name to save to.
engine ({'pickle', 'csv', 'hdf'}, optional) – Storage backend.
key (str, optional) – HDF key when
engine='hdf'.**kwargs – Passed through to the pandas writer.