xyzpy.gen.farming
xyzpy.gen.farming#
Objects for labelling and succesively running functions.
Functions
|
Convenient decorator to automatically wrap a function as a |
Classes
|
Container class for collecting and aggregating data to disk. |
|
Container class with all the information needed to systematically run a function over many parameters and capture the output in a dataset. |
|
Like a Harvester, but randomly samples combos and writes the table of results to a |
- class xyzpy.gen.farming.Harvester(runner, data_name=None, chunks=None, engine='h5netcdf', full_ds=None)[source]#
Container class for collecting and aggregating data to disk.
- Parameters
runner (Runner) – Performs the runs and describes the results.
data_name (str, optional) – Base file path to save data to.
chunks (int or dict, optional) – If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.
engine (str, optional) – Engine to use to save and load datasets.
full_ds (xarray.Dataset) – Initialize the Harvester with this dataset as the intitial full dataset.
Members –
------- –
full_ds – Dataset containing all data harvested so far, by default synced to disk.
last_ds (xarray.Dataset) – Dataset containing just the data from the last harvesting run.
- Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)[source]#
Return a Crop instance with this Harvester, from which fn will be set, and then combos can be sown, grown, and reaped into the
Harvester.full_ds
. SeeCrop
.- Return type
- add_ds(new_ds, sync=True, overwrite=None, chunks=None, engine=None)[source]#
Merge a new dataset into the in-memory full dataset.
- Parameters
new_ds (xr.Dataset or xr.DataArray) – Data to be merged into the full dataset.
sync (bool, optional) – If True (default), load and save the disk dataset before and after merging in the new data.
overwrite ({None, False, True}, optional) –
How to combine data from the new run into the current full_ds:
None
(default): attempt the merge and only raise if data conflicts.True
: overwrite conflicting current data with that from the new dataset.False
: drop any conflicting data from the new dataset.
chunks (int or dict, optional) – If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.
engine (str, optional) – Engine to use to save and load datasets.
- drop_sel(labels=None, *, errors='raise', engine=None, **labels_kwargs)[source]#
Drop specific values of coordinates from this harvester and its dataset. See http://xarray.pydata.org/en/latest/generated/xarray.Dataset.drop_sel.html. The change is immediately synced with the on-disk dataset. Useful for tidying uneeded data points.
- expand_dims(name, value, engine=None)[source]#
Add a new coordinate dimension with
name
andvalue
. The change is immediately synced with the on-disk dataset. Useful if you want to expand the parameter space along a previously constant argument.
- property full_ds#
Dataset containing all saved runs.
- harvest_cases(cases, *, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings)[source]#
Run cases, automatically merging into an on-disk dataset.
- Parameters
cases (list of dict or tuple) – The cases to run.
sync (bool, optional) – If True (default), load and save the disk dataset before and after merging in the new data.
overwrite ({None, False, True}, optional) –
What to do regarding clashes with old data:
None
(default): attempt the merge and only raise if data conflicts.True
: overwrite conflicting current data with that from the new dataset.False
: drop any conflicting data from the new dataset.
chunks (bool, optional) – If not None, passed passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.
engine (str, optional) – Engine to use to save and load datasets.
runner_settings – Supplied to
case_runner()
.
- harvest_combos(combos, *, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings)[source]#
Run combos, automatically merging into an on-disk dataset.
- Parameters
combos (dict_like[str, iterable]) – The combos to run. The only difference here is that you can supply an ellipse
...
, meaning the all values for that coordinate will be loaded from the current full dataset.sync (bool, optional) – If True (default), load and save the disk dataset before and after merging in the new data.
overwrite ({None, False, True}, optional) –
None
(default): attempt the merge and only raise if data conflicts.True
: overwrite conflicting current data with that from the new dataset.False
: drop any conflicting data from the new dataset.
chunks (bool, optional) – If not None, passed passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.
engine (str, optional) – Engine to use to save and load datasets.
runner_settings – Supplied to
combo_runner()
.
- property last_ds#
Dataset containing the last runs’ data.
- save_full_ds(new_full_ds=None, engine=None)[source]#
Save full_ds onto disk.
- Parameters
new_full_ds (xarray.Dataset, optional) – Save this dataset as the new full dataset, else use the current full datset.
engine (str, optional) – Engine to use to save and load datasets.
- class xyzpy.gen.farming.Runner(fn, var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, **default_runner_settings)[source]#
Container class with all the information needed to systematically run a function over many parameters and capture the output in a dataset.
- Parameters
fn (callable) – Function that produces a single instance of a result.
var_names (str, sequence of str, or None) – The ordered name(s) of the ouput variable(s) of fn. Set this explicitly to None if fn outputs already labelled data as a
Dataset
orDataArray
.fn_args (str, or sequence of str, optional) – The ordered name(s) of the input arguments(s) of fn. This is only needed if the cases or combos supplied are not dict-like.
var_dims (dict-like, optional) – Mapping of output variables to their named internal dimensions, can be the names of
constants
.var_coords (dict-like, optional) – Mapping of output variables named internal dimensions to the actual values they take.
constants (dict-like, optional) – Constants arguments to be supplied to fn. These can be used as ‘var_dims’, and will be saved as coords if so, otherwise as attributes.
resources (dict-like, optional) – Like constants but not saved to the the dataset, e.g. if very big.
attrs (dict-like, optional) – Any other miscelleous information to be saved with the dataset.
default_runner_settings – These keyword arguments will be supplied as defaults to any runner.
- Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)[source]#
Return a Crop instance with this runner, from which
fn
will be set, and then combos can be sown, grown, and reaped into theRunner.last_ds
. SeeCrop
.- Return type
- property constants#
Mapping of constant arguments supplied to the Runner’s function.
- property fn_args#
List of the names of the arguments that the Runner’s function takes.
- property resources#
Mapping of constant arguments supplied to the Runner’s function that are not saved with the dataset.
- run_cases(cases, constants=(), fn_args=None, **runner_settings)[source]#
Run cases using the function and save to dataset.
- Parameters
cases (sequence of mappings or tuples) – A sequence of cases.
constants (dict (optional)) – Extra constant arguments for this run, repeated arguments will take precedence over stored constants but for this run only.
runner_settings – Supplied to
case_runner()
.
- run_combos(combos, constants=(), **runner_settings)[source]#
Run combos using the function map and save to dataset.
- Parameters
combos (dict_like[str, iterable]) – The values of each function argument with which to evaluate all combinations.
constants (dict, optional) – Extra constant arguments for this run, repeated arguments will take precedence over stored constants but for this run only.
runner_settings – Keyword arguments supplied to
combo_runner()
.
- property var_coords#
Mapping of each variable named dimension to its coordinate values.
- property var_dims#
Mapping of each output variable to its named dimensions
- property var_names#
List of the names of the variables that the Runner’s function produces.
- class xyzpy.gen.farming.Sampler(runner, data_name=None, default_combos=None, full_df=None, engine='pickle')[source]#
Like a Harvester, but randomly samples combos and writes the table of results to a
pandas.DataFrame
.- Parameters
runner (xyzpy.Runner) – Runner describing a labelled function to run.
data_name (str, optional) – If given, the on-disk file to sync results with.
default_combos (dict_like[str, iterable], optional) – The default combos to sample from (which can be overridden).
full_df (pandas.DataFrame, optional) – If given, use this dataframe as the initial ‘full’ data.
engine ({'pickle', 'csv', 'json', 'hdf', ...}, optional) – How to save and load the on-disk dataframe. See
load_df()
andsave_df()
.
- full_df#
Dataframe describing all data harvested so far.
- Type
pandas.DataFrame
- last_df#
Dataframe describing the data harvested on the previous run.
- Type
pandas.Dataframe
- Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)[source]#
Return a Crop instance with this Sampler, from which fn will be set, and then samples can be sown, grown, and reaped into the
Sampler.full_df
. SeeCrop
.- Return type
- add_df(new_df, sync=True, engine=None)[source]#
Merge a new dataset into the in-memory full dataset.
- property full_df#
The dataframe describing all data harvested so far.
- property last_df#
The dataframe describing the last set of data harvested.
- sample_combos(n, combos=None, engine=None, **case_runner_settings)[source]#
Sample the target function many times, randomly choosing parameter combinations from
combos
(orSampleHarvester.default_combos
).- Parameters
n (int) – How many samples to run.
combos (dict_like[str, iterable], optional) – A mapping of function arguments to potential choices. Any keys in here will override
default_combos
. You can also suppply a callable to manually return a random choice e.g. from a probability distribution.engine (str, optional) – Which method to use to sync with the on-disk dataframe.
case_runner_settings – Supplied to
case_runner()
and so ontocombo_runner()
. This includesparallel=True
etc.
- xyzpy.gen.farming.label(var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, harvester=False, sampler=False, **default_runner_settings)[source]#
Convenient decorator to automatically wrap a function as a
Runner
orHarvester
.- Parameters
var_names (str, sequence of str, or None) – The ordered name(s) of the ouput variable(s) of fn. Set this explicitly to None if fn outputs already labelled data as a
Dataset
orDataArray
.fn_args (str, or sequence of str, optional) – The ordered name(s) of the input arguments(s) of fn. This is only needed if the cases or combos supplied are not dict-like.
var_dims (dict-like, optional) – Mapping of output variables to their named internal dimensions, can be the names of
constants
.var_coords (dict-like, optional) – Mapping of output variables named internal dimensions to the actual values they take.
constants (dict-like, optional) – Constants arguments to be supplied to fn. These can be used as ‘var_dims’, and will be saved as coords if so, otherwise as attributes.
resources (dict-like, optional) – Like constants but not saved to the the dataset, e.g. if very big.
attrs (dict-like, optional) – Any other miscelleous information to be saved with the dataset.
harvester (bool or str, optional) – If
True
, wrap the runner as aHarvester
, if a string, create the harvester with that as thedata_name
.default_runner_settings – These keyword arguments will be supplied as defaults to any runner.
Examples
Declare a function as a runner directly:
>>> import xyzpy as xyz >>> @xyz.label(var_names=['sum', 'diff']) ... def foo(x, y): ... return x + y, x - y ... >>> foo <xyzpy.Runner> fn: <function foo at 0x7f1fd8e5b1e0> fn_args: ('x', 'y') var_names: ('sum', 'diff') var_dims: {'sum': (), 'diff': ()} >>> foo(1, 2) # can still call it normally (3, -1)