xyzpy.manage ============ .. py:module:: xyzpy.manage .. autoapi-nested-parse:: Manage datasets --- loading, saving, merging etc. Attributes ---------- .. autoapisummary:: xyzpy.manage._DEFAULT_FN_CACHE_PATH xyzpy.manage._engine_extensions Functions --------- .. autoapisummary:: xyzpy.manage.cache_to_disk xyzpy.manage.auto_add_extension xyzpy.manage.save_ds xyzpy.manage.load_ds xyzpy.manage.save_merge_ds xyzpy.manage.trimna xyzpy.manage.sort_dims xyzpy.manage.post_fix xyzpy.manage.check_runs xyzpy.manage.auto_xyz_ds xyzpy.manage.merge_sync_conflict_datasets xyzpy.manage.save_df xyzpy.manage.load_df Module Contents --------------- .. py:data:: _DEFAULT_FN_CACHE_PATH :value: '__xyz_cache__' .. py:function:: cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs) Cache this function to disk, using joblib. .. py:data:: _engine_extensions .. py:function:: auto_add_extension(file_name, engine) Ensure ``file_name`` has an extension matching ``engine``. :param file_name: File name to normalize. :type file_name: str :param engine: Engine determining the extension. :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'} :returns: File name with an appropriate extension appended. :rtype: str .. py:function:: save_ds(ds, file_name, engine='h5netcdf', **kwargs) Saves a xarray dataset. :param ds: The dataset to save. :type ds: xarray.Dataset :param file_name: Name of the file to save to. :type file_name: str :param engine: Engine used to save file with. :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional :rtype: None .. py:function:: load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs) Loads a xarray dataset. Basically ``xarray.open_dataset`` with some different defaults and convenient behaviour. :param file_name: Name of file to open. :type file_name: str :param engine: Engine used to load file. :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional :param load_to_mem: Ince opened, load from disk into memory. Defaults to ``True`` if ``chunks=None``. :type load_to_mem: bool, optional :param create_new: If no file exists make a blank one. :type create_new: bool, optional :param chunks: Passed to ``xarray.open_dataset`` so that data is stored using ``dask.array``. :type chunks: int or dict :returns: **ds** -- Loaded Dataset. :rtype: xarray.Dataset .. py:function:: save_merge_ds(ds, fname, overwrite=None, **kwargs) Save dataset ``ds``, but check for an existing dataset with that name first, and if it exists, merge the two before saving. :param ds: The dataset to save. :type ds: xarray.Dataset :param fname: The file name. :type fname: str :param overwrite: How to merge the dataset with the existing dataset. - None: the datasets will be merged in there are no conflicts - False: data will be taken from old dataset if conflicting - True: data will be taken from new dataset if conflicting :type overwrite: {None, False, True}, optional .. py:function:: trimna(obj) Drop values across dims where all values are NaN. :param obj: Object to trim. :type obj: xarray.Dataset or xarray.DataArray :returns: Trimmed object. :rtype: same type as obj .. py:function:: sort_dims(ds) Reorder variable dimensions to match ``ds.dims``. This is an inplace operation. :param ds: Dataset to reorder in place. :type ds: xarray.Dataset :rtype: None .. py:function:: post_fix(ds, postfix) Append ``"_{postfix}"`` to each data variable name. :param ds: Dataset to rename. :type ds: xarray.Dataset :param postfix: Suffix to append. :type postfix: str :returns: Renamed dataset. :rtype: xarray.Dataset .. py:function:: check_runs(obj, dim='run', var=None, sel=()) Print out information about the range and any missing values for an integer dimension. :param obj: Data to check. :type obj: xarray object :param dim: Dimension to check, defaults to 'run'. :type dim: str (optional) :param var: Subselect this data variable first. :type var: str (optional) :param sel: Subselect these other coordinates first. :type sel: mapping (optional) .. py:function:: auto_xyz_ds(x, y_z=None) Automatically turn an array into a `xarray` dataset. Transpose ``y_z`` if necessary to automatically match dimension sizes. :param x: The x-coordinates. :type x: array_like :param y_z: The y-data, possibly varying with coordinate z. :type y_z: array_like, optional .. py:function:: merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False) Glob files based on `base_name`, merge them, save this new dataset if it contains new info, then clean up the conflicts. :param base_name: Base file name to glob on - should include '*'. :type base_name: str :param engine: Load and save engine used by xarray. :type engine: str , optional :param combine_first: If True, combine datasets sequentially using ``combine_first``, preferring the first dataset in the list, which is assumed to be the original. If False, merge all datasets together using ``xr.merge``, which will raise an error if there are any conflicts. :type combine_first: bool, optional .. py:function:: save_df(df, name, engine='pickle', key='df', **kwargs) Save a dataframe to disk. :param df: DataFrame to save. :type df: pandas.DataFrame :param name: File name to save to. :type name: str :param engine: Storage backend. :type engine: {'pickle', 'csv', 'hdf'}, optional :param key: HDF key when ``engine='hdf'``. :type key: str, optional :param \*\*kwargs: Passed through to the pandas writer. .. py:function:: load_df(name, engine='pickle', key='df', **kwargs) Load a dataframe from disk. :param name: File name to read from. :type name: str :param engine: Storage backend. :type engine: {'pickle', 'csv', 'hdf'}, optional :param key: HDF key when ``engine='hdf'``. :type key: str, optional :param \*\*kwargs: Passed through to the pandas reader. :returns: Loaded dataframe. :rtype: pandas.DataFrame