xyzpy.manage
============

.. py:module:: xyzpy.manage

.. autoapi-nested-parse::

   Manage datasets --- loading, saving, merging etc.


Attributes
----------

.. autoapisummary::

   xyzpy.manage._DEFAULT_FN_CACHE_PATH
   xyzpy.manage._engine_extensions


Functions
---------

.. autoapisummary::

   xyzpy.manage.cache_to_disk
   xyzpy.manage.auto_add_extension
   xyzpy.manage.save_ds
   xyzpy.manage.load_ds
   xyzpy.manage.save_merge_ds
   xyzpy.manage.trimna
   xyzpy.manage.sort_dims
   xyzpy.manage.post_fix
   xyzpy.manage.check_runs
   xyzpy.manage.auto_xyz_ds
   xyzpy.manage.merge_sync_conflict_datasets
   xyzpy.manage.save_df
   xyzpy.manage.load_df


Module Contents
---------------

.. py:data:: _DEFAULT_FN_CACHE_PATH
   :value: '__xyz_cache__'


.. py:function:: cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs)

   Cache this function to disk, using joblib.


.. py:data:: _engine_extensions

.. py:function:: auto_add_extension(file_name, engine)

   Ensure ``file_name`` has an extension matching ``engine``.

   :param file_name: File name to normalize.
   :type file_name: str
   :param engine: Engine determining the extension.
   :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}

   :returns: File name with an appropriate extension appended.
   :rtype: str


.. py:function:: save_ds(ds, file_name, engine='h5netcdf', **kwargs)

   Saves a xarray dataset.

   :param ds: The dataset to save.
   :type ds: xarray.Dataset
   :param file_name: Name of the file to save to.
   :type file_name: str
   :param engine: Engine used to save file with.
   :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional

   :rtype: None


.. py:function:: load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)

   Loads a xarray dataset. Basically ``xarray.open_dataset`` with some
   different defaults and convenient behaviour.

   :param file_name: Name of file to open.
   :type file_name: str
   :param engine: Engine used to load file.
   :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional
   :param load_to_mem: Ince opened, load from disk into memory. Defaults to ``True`` if
                       ``chunks=None``.
   :type load_to_mem: bool, optional
   :param create_new: If no file exists make a blank one.
   :type create_new: bool, optional
   :param chunks: Passed to ``xarray.open_dataset`` so that data is stored using
                  ``dask.array``.
   :type chunks: int or dict

   :returns: **ds** -- Loaded Dataset.
   :rtype: xarray.Dataset


.. py:function:: save_merge_ds(ds, fname, overwrite=None, **kwargs)

   Save dataset ``ds``, but check for an existing dataset with that name
   first, and if it exists, merge the two before saving.

   :param ds: The dataset to save.
   :type ds: xarray.Dataset
   :param fname: The file name.
   :type fname: str
   :param overwrite:
                     How to merge the dataset with the existing dataset.

                         - None: the datasets will be merged in there are no conflicts
                         - False: data will be taken from old dataset if conflicting
                         - True: data will be taken from new dataset if conflicting
   :type overwrite: {None, False, True}, optional


.. py:function:: trimna(obj)

   Drop values across dims where all values are NaN.

   :param obj: Object to trim.
   :type obj: xarray.Dataset or xarray.DataArray

   :returns: Trimmed object.
   :rtype: same type as obj


.. py:function:: sort_dims(ds)

   Reorder variable dimensions to match ``ds.dims``. This is an inplace
   operation.

   :param ds: Dataset to reorder in place.
   :type ds: xarray.Dataset

   :rtype: None


.. py:function:: post_fix(ds, postfix)

   Append ``"_{postfix}"`` to each data variable name.

   :param ds: Dataset to rename.
   :type ds: xarray.Dataset
   :param postfix: Suffix to append.
   :type postfix: str

   :returns: Renamed dataset.
   :rtype: xarray.Dataset


.. py:function:: check_runs(obj, dim='run', var=None, sel=())

   Print out information about the range and any missing values for an
   integer dimension.

   :param obj: Data to check.
   :type obj: xarray object
   :param dim: Dimension to check, defaults to 'run'.
   :type dim: str (optional)
   :param var: Subselect this data variable first.
   :type var: str (optional)
   :param sel: Subselect these other coordinates first.
   :type sel: mapping (optional)


.. py:function:: auto_xyz_ds(x, y_z=None)

   Automatically turn an array into a `xarray` dataset. Transpose ``y_z``
   if necessary to automatically match dimension sizes.

   :param x: The x-coordinates.
   :type x: array_like
   :param y_z: The y-data, possibly varying with coordinate z.
   :type y_z: array_like, optional


.. py:function:: merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)

   Glob files based on `base_name`, merge them, save this new dataset if
   it contains new info, then clean up the conflicts.

   :param base_name: Base file name to glob on - should include '*'.
   :type base_name: str
   :param engine: Load and save engine used by xarray.
   :type engine: str , optional
   :param combine_first: If True, combine datasets sequentially using ``combine_first``,
                         preferring the first dataset in the list, which is assumed to be
                         the original. If False, merge all datasets together using
                         ``xr.merge``, which will raise an error if there are any conflicts.
   :type combine_first: bool, optional


.. py:function:: save_df(df, name, engine='pickle', key='df', **kwargs)

   Save a dataframe to disk.

   :param df: DataFrame to save.
   :type df: pandas.DataFrame
   :param name: File name to save to.
   :type name: str
   :param engine: Storage backend.
   :type engine: {'pickle', 'csv', 'hdf'}, optional
   :param key: HDF key when ``engine='hdf'``.
   :type key: str, optional
   :param \*\*kwargs: Passed through to the pandas writer.


.. py:function:: load_df(name, engine='pickle', key='df', **kwargs)

   Load a dataframe from disk.

   :param name: File name to read from.
   :type name: str
   :param engine: Storage backend.
   :type engine: {'pickle', 'csv', 'hdf'}, optional
   :param key: HDF key when ``engine='hdf'``.
   :type key: str, optional
   :param \*\*kwargs: Passed through to the pandas reader.

   :returns: Loaded dataframe.
   :rtype: pandas.DataFrame