xyzpy.gen.farming
=================

.. py:module:: xyzpy.gen.farming

.. autoapi-nested-parse::

   Objects for labelling and succesively running functions.


Classes
-------

.. autoapisummary::

   xyzpy.gen.farming.Runner
   xyzpy.gen.farming.Harvester
   xyzpy.gen.farming.Sampler


Functions
---------

.. autoapisummary::

   xyzpy.gen.farming.label
   xyzpy.gen.farming.cultivate


Module Contents
---------------

.. py:class:: Runner(fn, var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, **default_runner_settings)

   Bases: :py:obj:`object`


   Container class with all the information needed to systematically
   run a function over many parameters and capture the output in a dataset.

   :param fn: Function that produces a single instance of a result.
   :type fn: callable
   :param var_names: The ordered name(s) of the ouput variable(s) of `fn`. Set this
                     explicitly to None if `fn` outputs already labelled data as a dict,
                     :class:`~xarray.Dataset`, or :class:`~xarray.DataArray`.
   :type var_names: str, sequence of str, or None
   :param fn_args: The ordered name(s) of the input arguments(s) of `fn`. This is only
                   needed if the cases or combos supplied are not dict-like.
   :type fn_args: str, or sequence of str, optional
   :param var_dims: Mapping of output variables to their named internal dimensions, can be
                    the names of ``constants``.
   :type var_dims: dict-like, optional
   :param var_coords: Mapping of output variables named internal dimensions to the actual
                      values they take.
   :type var_coords: dict-like, optional
   :param constants: Constants arguments to be supplied to `fn`. These can be used as
                     'var_dims', and will be saved as coords if so, otherwise as attributes.
   :type constants: dict-like, optional
   :param resources: Like `constants` but not saved to the the dataset, e.g. if very big.
   :type resources: dict-like, optional
   :param attrs: Any other miscelleous information to be saved with the dataset.
   :type attrs: dict-like, optional
   :param default_runner_settings: These keyword arguments will be supplied as defaults to any runner.


   .. py:attribute:: fn


   .. py:attribute:: _var_names
      :value: (None,)


   .. py:attribute:: _fn_args


   .. py:attribute:: _var_dims


   .. py:attribute:: _var_coords


   .. py:attribute:: _constants


   .. py:attribute:: _resources


   .. py:attribute:: _attrs


   .. py:attribute:: _last_ds
      :value: None


   .. py:attribute:: default_runner_settings


   .. py:method:: __call__(*args, **kwargs)


   .. py:method:: _get_fn_args()


   .. py:method:: _set_fn_args(fn_args)


   .. py:method:: _del_fn_args()


   .. py:attribute:: fn_args


   .. py:method:: _get_var_names()


   .. py:method:: _set_var_names(var_names)


   .. py:method:: _del_var_names()


   .. py:attribute:: var_names


   .. py:method:: _get_var_dims()


   .. py:method:: _set_var_dims(var_dims, var_names=None)


   .. py:method:: _del_var_dims()


   .. py:attribute:: var_dims


   .. py:method:: _get_var_coords()


   .. py:method:: _set_var_coords(var_coords)


   .. py:method:: _del_var_coords()


   .. py:attribute:: var_coords


   .. py:method:: _get_constants()


   .. py:method:: _set_constants(constants)


   .. py:method:: _del_constants()


   .. py:attribute:: constants


   .. py:method:: _get_resources()


   .. py:method:: _set_resources(resources)


   .. py:method:: _del_resources()


   .. py:attribute:: resources


   .. py:property:: last_ds


   .. py:method:: run_combos(combos, constants=(), **runner_settings)

      Run combos using the function map and save to dataset.

      :param combos: The values of each function argument with which to evaluate all
                     combinations.
      :type combos: dict_like[str, iterable]
      :param constants: Extra constant arguments for this run, repeated arguments will
                        take precedence over stored constants but for this run only.
      :type constants: dict, optional
      :param runner_settings: Keyword arguments supplied to :func:`~xyzpy.combo_runner`.


   .. py:method:: run_cases(cases, constants=(), fn_args=None, **runner_settings)

      Run cases using the function and save to dataset.

      :param cases: A sequence of cases.
      :type cases: sequence of mappings or tuples
      :param constants: Extra constant arguments for this run, repeated arguments will
                        take precedence over stored constants but for this run only.
      :type constants: dict (optional)
      :param runner_settings: Supplied to :func:`~xyzpy.case_runner`.


   .. py:method:: Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)

      Return a Crop instance with this runner, from which ``fn``
      will be set, and then combos can be sown, grown, and reaped into the
      ``Runner.last_ds``. See :class:`~xyzpy.Crop`.

      :rtype: Crop


   .. py:method:: __repr__()


.. py:function:: label(var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, harvester=False, sampler=False, engine=None, **default_runner_settings)

   Convenient decorator to automatically wrap a function as a
   :class:`~xyzpy.Runner` or :class:`~xyzpy.Harvester`.

   :param var_names: The ordered name(s) of the ouput variable(s) of `fn`. Set this
                     explicitly to None if `fn` outputs already labelled data as a `dict`,
                     :class:`~xarray.Dataset`, or :class:`~xarray.DataArray`.
   :type var_names: str, sequence of str, or None
   :param fn_args: The ordered name(s) of the input arguments(s) of `fn`. This is only
                   needed if the cases or combos supplied are not dict-like.
   :type fn_args: str, or sequence of str, optional
   :param var_dims: Mapping of output variables to their named internal dimensions, can be
                    the names of ``constants``.
   :type var_dims: dict-like, optional
   :param var_coords: Mapping of output variables named internal dimensions to the actual
                      values they take.
   :type var_coords: dict-like, optional
   :param constants: Constants arguments to be supplied to `fn`. These can be used as
                     'var_dims', and will be saved as coords if so, otherwise as attributes.
   :type constants: dict-like, optional
   :param resources: Like `constants` but not saved to the the dataset, e.g. if very big.
   :type resources: dict-like, optional
   :param attrs: Any other miscelleous information to be saved with the dataset.
   :type attrs: dict-like, optional
   :param harvester: If ``True``, wrap the runner as a :class:`~xyzpy.Harvester`, if a
                     string, create the harvester with that as the ``data_name``.
   :type harvester: bool or str, optional
   :param default_runner_settings: These keyword arguments will be supplied as defaults to any runner.

   .. rubric:: Examples

   Declare a function as a runner directly::

       >>> import xyzpy as xyz

       >>> @xyz.label(var_names=['sum', 'diff'])
       ... def foo(x, y):
       ...     return x + y, x - y
       ...

       >>> foo
       <xyzpy.Runner>
           fn: <function foo at 0x7f1fd8e5b1e0>
           fn_args: ('x', 'y')
           var_names: ('sum', 'diff')
           var_dims: {'sum': (), 'diff': ()}

       >>> foo(1, 2)  # can still call it normally
       (3, -1)


.. py:class:: Harvester(runner: Runner, data_name=None, chunks=None, engine='h5netcdf', full_ds=None)

   Bases: :py:obj:`object`


   Container class for collecting and aggregating data to disk.

   :param runner: Performs the runs and describes the results.
   :type runner: Runner
   :param data_name: Base file path to save data to.
   :type data_name: str, optional
   :param chunks: If not None, passed to xarray so that the full dataset is loaded and
                  merged into with on-disk dask arrays.
   :type chunks: int or dict, optional
   :param engine: Engine to use to save and load datasets.
   :type engine: str, optional
   :param full_ds: Initialize the Harvester with this dataset as the intitial full
                   dataset.
   :type full_ds: xarray.Dataset, optional
   :param Members:
   :param -------:
   :param full_ds: Dataset containing all data harvested so far, by default synced to
                   disk.
   :type full_ds: xarray.Dataset
   :param last_ds: Dataset containing just the data from the last harvesting run.
   :type last_ds: xarray.Dataset


   .. py:attribute:: runner


   .. py:attribute:: data_name
      :value: None


   .. py:attribute:: engine
      :value: 'h5netcdf'


   .. py:attribute:: chunks
      :value: None


   .. py:attribute:: _full_ds
      :value: None


   .. py:property:: fn


   .. py:method:: __call__(*args, **kwargs)


   .. py:property:: last_ds

      Dataset containing the last runs' data.


   .. py:method:: load_full_ds(chunks=None, engine=None)

      Load the disk dataset into ``full_ds``.

      :param chunks: If not None, passed to xarray so that the full dataset is loaded
                     and merged into with on-disk dask arrays.
      :type chunks: int or dict, optional
      :param engine: Engine to use to save and load datasets.
      :type engine: str, optional


   .. py:property:: full_ds

      Dataset containing all saved runs.


   .. py:method:: save_full_ds(new_full_ds=None, engine=None)

      Save `full_ds` onto disk. The old file is moved and kept as a backup
      in case of errors when writing the new dataset to disk.

      :param new_full_ds: Save this dataset as the new full dataset, else use the current
                          full datset.
      :type new_full_ds: xarray.Dataset, optional
      :param engine: Engine to use to save and load datasets.
      :type engine: str, optional


   .. py:method:: delete_ds(backup=False)

      Delete the on-disk dataset, optionally backing it up first.


   .. py:method:: add_ds(new_ds, sync=True, overwrite=None, chunks=None, engine=None)

      Merge a new dataset into the in-memory full dataset.

      :param new_ds: Data to be merged into the full dataset.
      :type new_ds: xr.Dataset or xr.DataArray
      :param sync: If True (default), load and save the disk dataset before
                   and after merging in the new data.
      :type sync: bool, optional
      :param overwrite: How to combine data from the new run into the current full_ds:

                        - ``None`` (default): attempt the merge and only raise if
                          data conflicts.
                        - ``True``: overwrite conflicting current data with
                          that from the new dataset.
                        - ``False``: drop any conflicting data from the new dataset.
      :type overwrite: {None, False, True}, optional
      :param chunks: If not None, passed to xarray so that the full dataset is loaded
                     and merged into with on-disk dask arrays.
      :type chunks: int or dict, optional
      :param engine: Engine to use to save and load datasets.
      :type engine: str, optional


   .. py:method:: expand_dims(name, value, engine=None)

      Add a new coordinate dimension with ``name`` and ``value``. The
      change is immediately synced with the on-disk dataset. Useful if you
      want to expand the parameter space along a previously constant
      argument.


   .. py:method:: drop_sel(labels=None, *, errors='raise', engine=None, **labels_kwargs)

      Drop specific values of coordinates from this harvester and its
      dataset. See
      http://xarray.pydata.org/en/latest/generated/xarray.Dataset.drop_sel.html.
      The change is immediately synced with the on-disk dataset.
      Useful for tidying uneeded data points.


   .. py:method:: _maybe_expand_combos(combos)

      Expand combos with ellipses into full coordinate values from the
      current full dataset.


   .. py:method:: harvest_combos(combos, *, cases=None, missing_only=False, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings)

      Run combos, automatically merging into an on-disk dataset.

      :param combos: The combos to run. The only difference here is that you can supply
                     an ellipse ``...``, meaning the all values for that coordinate will
                     be loaded from the current full dataset.
      :type combos: dict_like[str, iterable]
      :param missing_only: If True, only run combos that are not already present in the
                           on-disk dataset.
      :type missing_only: bool, optional
      :param sync: If True (default), load and save the disk dataset before
                   and after merging in the new data.
      :type sync: bool, optional
      :param overwrite:
                        - ``None`` (default): attempt the merge and only raise if
                          data conflicts.
                        - ``True``: overwrite any conflicting current data with that from
                          the new dataset.
                        - ``False``: drop any conflicting data from the new dataset.
      :type overwrite: {None, False, True}, optional
      :param chunks: If not None, passed passed to xarray so that the full dataset is
                     loaded and merged into with on-disk dask arrays.
      :type chunks: bool, optional
      :param engine: Engine to use to save and load datasets.
      :type engine: str, optional
      :param runner_settings: Supplied to :func:`~xyzpy.combo_runner`.


   .. py:method:: harvest_cases(cases, *, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings)

      Run cases, automatically merging into an on-disk dataset.

      :param cases: The cases to run.
      :type cases: list of dict or tuple
      :param sync: If True (default), load and save the disk dataset before
                   and after merging in the new data.
      :type sync: bool, optional
      :param overwrite: What to do regarding clashes with old data:

                        - ``None`` (default): attempt the merge and only raise if
                          data conflicts.
                        - ``True``: overwrite conflicting current data with
                          that from the new dataset.
                        - ``False``: drop any conflicting data from the new dataset.
      :type overwrite: {None, False, True}, optional
      :param chunks: If not None, passed passed to xarray so that the full dataset is
                     loaded and merged into with on-disk dask arrays.
      :type chunks: bool, optional
      :param engine: Engine to use to save and load datasets.
      :type engine: str, optional
      :param runner_settings: Supplied to :func:`~xyzpy.case_runner`.


   .. py:method:: Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)

      Return a Crop instance with this Harvester, from which `fn`
      will be set, and then combos can be sown, grown, and reaped into the
      ``Harvester.full_ds``. See :class:`~xyzpy.Crop`.

      :rtype: Crop


   .. py:method:: __repr__()


   .. py:method:: cultivate(combos=None, cases=None, constants=None, name=None, parent_dir=None, batchsize=None, num_batches=None, missing_only=True, shuffle=True, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, log=False, raise_errors=True, verbosity=1, on_existing='ask', on_error='ask', clean_up=None, **grow_kwargs)

      Convenience method to run a full cycle of parsing combos into
      missing cases only, then persistently growing those cases, and finally
      merging the results into the full dataset.

      :param combos: The combos to run. The only difference here is that you can supply
                     an ellipse ``...``, meaning the all values for that coordinate will
                     be loaded from the current full dataset.
      :type combos: dict_like[str, iterable]
      :param cases: A sequence of (partial) individual settings to run. For each case,
                    all settings given by combos will be generated.
      :type cases: sequence of mappings or tuples, optional
      :param constants: Extra constant arguments for this run.
      :type constants: dict, optional
      :param name: Name for the crop to be used for on-disk storage of batches,
                   results and logs. You can use different names to grow results for
                   the same dataset concurrently.
      :type name: str, optional
      :param parent_dir: Parent directory in which to create the crop folder
                         (``.xyz-{name}/``). Defaults to the current working directory.
      :type parent_dir: str, optional
      :param batchsize: If given, the target number of cases to sow in each batch. This is
                        computed from ``num_batches`` if not given and 1 if neither given.
      :type batchsize: int, optional
      :param num_batches: If given, the target number of batches to sow. This is computed
                          from ``batchsize`` if not given and 1 if neither given.
      :type num_batches: int, optional
      :param missing_only: If True (default), only run cases that are not already present in
                           the on-disk dataset. If `False`, the new results will overwrite
                           any existing results.
      :type missing_only: bool, optional
      :param shuffle: If True (default), shuffle the order of cases before sowing and
                      growing. This can be a useful basic form of load balancing.
      :type shuffle: bool, optional
      :param subprocess: Whether to grow each batch in a fresh subprocess. This adds about
                         1 second of overhead per batch, but allows the number of threads,
                         cpu affinity and gpu assignment to be controlled. If "auto"
                         (default) subprocesses are used when ``num_threads``, ``gpus`` or
                         ``affinities`` are specified. See :meth:`xyzpy.Crop.grow` for
                         details.
      :type subprocess: "auto" or bool, optional
      :param num_workers: Maximum number of batches to run concurrently. In subprocess mode
                          this caps simultaneous subprocesses (defaults to 1 if not given).
                          In in-process mode this is the joblib loky pool size (``None`` =
                          serial). Forwarded to :meth:`xyzpy.Crop.grow`.
      :type num_workers: int, optional
      :param num_threads: Number of threads each worker is allowed to use, applied via the
                          standard env vars (``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, etc.)
                          in each subprocess. Implies ``subprocess=True`` when
                          ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`.
      :type num_threads: int, optional
      :param gpus: GPU device IDs to assign to subprocesses via
                   ``CUDA_VISIBLE_DEVICES``; the pool also caps concurrency. Implies
                   ``subprocess=True`` when ``subprocess="auto"``. Forwarded to
                   :meth:`xyzpy.Crop.grow`.
      :type gpus: int, str, or sequence of int, optional
      :param affinities: CPU core IDs to pin subprocesses to via ``taskset``; the pool also
                         caps concurrency. Implies ``subprocess=True`` when
                         ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`.
      :type affinities: int, str, or sequence of int, optional
      :param log: Whether to save subprocess stdout and stderr to files in the crop
                  directory under ``logs/batch-{batch_id}.log``. Subprocess-mode
                  only. Forwarded to :meth:`xyzpy.Crop.grow`.
      :type log: bool, optional
      :param raise_errors: If True (default), raise any errors that occur during growing,
                           otherwise just log them and continue with the next batch.
      :type raise_errors: bool, optional
      :param verbosity: The level of logging to print during the sow/grow/reap process.
                        0: no output, 1: progress bars, 2: progress bars with current
                        setting postfixed.
      :type verbosity: int, optional
      :param on_existing: What to do if a crop with the same name already exists on
                          disk. Default is ``'ask'`` (interactive prompt).
      :type on_existing: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional
      :param on_error: What to do if an error occurs during grow/reap. Default is
                       ``'ask'`` (interactive prompt).
      :type on_error: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional
      :param clean_up: Whether to delete the on-disk batch, result and log files after
                       successfully reaping.
      :type clean_up: bool or None, optional
      :param grow_kwargs: Further keyword arguments forwarded to :meth:`xyzpy.Crop.grow`
                          (e.g. ``executor``, ``min_wait``, ...).

      .. seealso:: :py:obj:`Harvester.harvest_combos`, :py:obj:`xyzpy.Crop.grow`


.. py:function:: cultivate(fn, *, var_names=None, data_name=None, runner_opts=None, harvester_opts=None, combos=None, cases=None, constants=None, name=None, parent_dir=None, batchsize=None, num_batches=None, missing_only=True, shuffle=True, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, log=False, raise_errors=True, verbosity=1, on_existing='ask', on_error='ask', clean_up=None, **grow_kwargs)

   Convenience function to run a full cycle of annotating a function,
   parsing combos into missing cases only, then persistently growing those
   cases, and finally merging the results into the full dataset.

   :param fn: The function to run over combos and cases. This will be wrapped in
              a :class:`~xyzpy.Runner` and :class:`~xyzpy.Harvester` to perform the
              cultivation process. If `var_names` is None, it should return a `dict`,
              :class:`~xarray.Dataset` or :class:`~xarray.DataArray`.
   :type fn: callable
   :param var_names: The ordered name(s) of the ouput variable(s) of `fn`. Set this
                     explicitly to None if `fn` outputs already labelled data as a `dict`,
                     :class:`~xarray.Dataset`, or :class:`~xarray.DataArray`.
   :type var_names: str, sequence of str, or None
   :param data_name: If given, the on-disk file to sync results with. If not set there will
                     be no persistent results, since the harvester created in this
                     functional interface is ephemeral.
   :type data_name: str, optional
   :param runner_opts: Keyword arguments to be supplied to :class:`~xyzpy.Runner`.
   :type runner_opts: dict, optional
   :param harvester_opts: Keyword arguments to be supplied to :class:`~xyzpy.Harvester`.
   :type harvester_opts: dict, optional
   :param combos: The combos to run. The only difference here is that you can supply
                  an ellipse ``...``, meaning the all values for that coordinate will
                  be loaded from the current full dataset.
   :type combos: dict_like[str, iterable]
   :param cases: A sequence of (partial) individual settings to run. For each case,
                 all settings given by combos will be generated.
   :type cases: sequence of mappings or tuples, optional
   :param constants: Extra constant arguments for this run.
   :type constants: dict, optional
   :param name: Name for the crop to be used for on-disk storage of batches,
                results and logs. You can use different names to grow results for
                the same dataset concurrently.
   :type name: str, optional
   :param parent_dir: Parent directory in which to create the crop folder
                      (``.xyz-{name}/``). Defaults to the current working directory.
   :type parent_dir: str, optional
   :param batchsize: If given, the target number of cases to sow in each batch. This is
                     computed from ``num_batches`` if not given and 1 if neither given.
   :type batchsize: int, optional
   :param num_batches: If given, the target number of batches to sow. This is computed
                       from ``batchsize`` if not given and 1 if neither given.
   :type num_batches: int, optional
   :param missing_only: If True (default), only run cases that are not already present in
                        the on-disk dataset
   :type missing_only: bool, optional
   :param shuffle: If True (default), shuffle the order of cases before sowing and
                   growing. This can be a useful basic form of load balancing.
   :type shuffle: bool, optional
   :param subprocess: Whether to grow each batch in a fresh subprocess. This adds about
                      1 second of overhead per batch, but allows the number of threads,
                      cpu affinity and gpu assignment to be controlled. If "auto"
                      (default) subprocesses are used when ``num_threads``, ``gpus`` or
                      ``affinities`` are specified. See :meth:`xyzpy.Crop.grow` for details.
   :type subprocess: "auto" or bool, optional
   :param num_workers: Maximum number of batches to run concurrently. In subprocess mode
                       this caps simultaneous subprocesses (defaults to 1 if not given).
                       In in-process mode this is the joblib loky pool size (``None`` =
                       serial). Forwarded to :meth:`xyzpy.Crop.grow`.
   :type num_workers: int, optional
   :param num_threads: Number of threads each worker is allowed to use, applied via the
                       standard env vars (``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, etc.)
                       in each subprocess. Implies ``subprocess=True`` when
                       ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`.
   :type num_threads: int, optional
   :param gpus: GPU device IDs to assign to subprocesses via
                ``CUDA_VISIBLE_DEVICES``; the pool also caps concurrency. Implies
                ``subprocess=True`` when ``subprocess="auto"``. Forwarded to
                :meth:`xyzpy.Crop.grow`.
   :type gpus: int, str, or sequence of int, optional
   :param affinities: CPU core IDs to pin subprocesses to via ``taskset``; the pool also
                      caps concurrency. Implies ``subprocess=True`` when
                      ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`.
   :type affinities: int, str, or sequence of int, optional
   :param log: Whether to save subprocess stdout and stderr to files in the crop
               directory under ``logs/batch-{batch_id}.log``. Subprocess-mode only.
               Forwarded to :meth:`xyzpy.Crop.grow`.
   :type log: bool, optional
   :param raise_errors: If True (default), raise any errors that occur during growing,
                        otherwise just log them and continue with the next batch.
   :type raise_errors: bool, optional
   :param verbosity: The level of logging to print during the sow/grow/reap process.
                     0: no output, 1: progress bars, 2: progress bars with current
                     setting postfixed.
   :type verbosity: int, optional
   :param on_existing: What to do if a crop with the same name already exists on
                       disk. Default is ``'ask'`` (interactive prompt).
   :type on_existing: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional
   :param on_error: What to do if an error occurs during grow/reap. Default is
                    ``'ask'`` (interactive prompt).
   :type on_error: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional
   :param clean_up: Whether to delete the on-disk batch, result and log files after
                    successfully reaping.
   :type clean_up: bool or None, optional
   :param grow_kwargs: Further keyword arguments forwarded to :meth:`xyzpy.Crop.grow`
                       (e.g. ``executor``, ``min_wait``, ...).

   .. seealso:: :py:obj:`Harvester.cultivate`, :py:obj:`xyzpy.Crop.grow`


.. py:class:: Sampler(runner, data_name=None, default_combos=None, full_df=None, engine='pickle')

   Like a Harvester, but randomly samples combos and writes the table of
   results to a ``pandas.DataFrame``.

   :param runner: Runner describing a labelled function to run.
   :type runner: xyzpy.Runner
   :param data_name: If given, the on-disk file to sync results with.
   :type data_name: str, optional
   :param default_combos: The default combos to sample from (which can be overridden).
   :type default_combos: dict_like[str, iterable], optional
   :param full_df: If given, use this dataframe as the initial 'full' data.
   :type full_df: pandas.DataFrame, optional
   :param engine: How to save and load the on-disk dataframe. See
                  :func:`~xyzpy.manage.load_df` and :func:`~xyzpy.manage.save_df`.
   :type engine: {'pickle', 'csv', 'json', 'hdf', ...}, optional

   .. attribute:: full_df

      Dataframe describing all data harvested so far.

      :type: pandas.DataFrame

   .. attribute:: last_df

      Dataframe describing the data harvested on the previous run.

      :type: pandas.Dataframe


   .. py:attribute:: runner


   .. py:attribute:: data_name
      :value: None


   .. py:attribute:: default_combos


   .. py:attribute:: _full_df
      :value: None


   .. py:attribute:: _last_df
      :value: None


   .. py:attribute:: engine
      :value: 'pickle'


   .. py:property:: fn


   .. py:method:: load_full_df(engine=None)

      Load the on-disk full dataframe into memory.


   .. py:property:: full_df

      The dataframe describing all data harvested so far.


   .. py:property:: last_df

      The dataframe describing the last set of data harvested.


   .. py:method:: save_full_df(new_full_df=None, engine=None)

      Save `full_df` onto disk.

      :param new_full_df: Save this dataframe as the new full dataframe, else use the
                          current ``full_df``.
      :type new_full_df: pandas.DataFrame, optional
      :param engine: Which engine to save the dataframe with, if None use the default.
      :type engine: str, optional


   .. py:method:: delete_df(backup=False)

      Delete the on-disk dataframe, optionally backing it up first.


   .. py:method:: add_df(new_df, sync=True, engine=None)

      Merge a new dataset into the in-memory full dataset.

      :param new_df: Data to be appended to the full dataset.
      :type new_df: pandas.DataFrame or dict
      :param sync: If True (default), load and save the disk dataframe before
                   and after merging in the new data.
      :type sync: bool, optional
      :param engine: Which engine to save the dataframe with.
      :type engine: str, optional


   .. py:method:: gen_cases_fnargs(n, combos=None)


   .. py:method:: sample_combos(n, combos=None, engine=None, **case_runner_settings)

      Sample the target function many times, randomly choosing parameter
      combinations from ``combos`` (or ``SampleHarvester.default_combos``).

      :param n: How many samples to run.
      :type n: int
      :param combos: A mapping of function arguments to potential choices. Any keys in
                     here will override ``default_combos``. You can also suppply a
                     callable to manually return a random choice e.g. from a probability
                     distribution.
      :type combos: dict_like[str, iterable], optional
      :param engine: Which method to use to sync with the on-disk dataframe.
      :type engine: str, optional
      :param case_runner_settings: Supplied to :func:`~xyzpy.case_runner` and so onto
                                   :func:`~xyzpy.combo_runner`. This includes ``parallel=True`` etc.


   .. py:method:: Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)

      Return a Crop instance with this Sampler, from which `fn`
      will be set, and then samples can be sown, grown, and reaped into the
      ``Sampler.full_df``. See :class:`~xyzpy.Crop`.

      :rtype: Crop


   .. py:method:: __repr__()