xyzpy#

Classes

XYZPY(xarray_obj)

class xyzpy.Benchmarker(kernels, setup=None, names=None, benchmark_opts=None, data_name=None)[source]#

Compare the performance of various kernels. Internally this makes use of benchmark(), Harvester() and xyzpys plotting functionality.

Parameters
  • kernels (sequence of callable) – The functions to compare performance with.

  • setup (callable, optional) – If given, setup each benchmark run by suppling the size argument n to this function first, then feeding its output to each of the functions.

  • names (sequence of str, optional) – Alternate names to give the function, else they will be inferred.

  • benchmark_opts (dict, optional) – Supplied to benchmark().

  • data_name (str, optional) – If given, the file name the internal harvester will use to store results persistently.

harvester#

The harvester that runs and accumulates all the data.

Type

xyz.Harvester

ds#

Shortcut to the harvester’s full dataset.

Type

xarray.Dataset

ilineplot(**plot_opts)[source]#

Interactively plot the benchmarking results.

lineplot(**plot_opts)[source]#

Plot the benchmarking results.

run(ns, kernels=None, **harvest_opts)[source]#

Run the benchmarks. Each run accumulates rather than overwriting the results.

Parameters
  • ns (sequence of int or int) – The sizes to run the benchmarks with.

  • kernels (sequence of str, optional) – If given, only run the kernels with these names.

  • harvest_opts – Supplied to harvest_combos().

class xyzpy.Crop(*, fn=None, name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None, shuffle=False, farmer=None, autoload=True)[source]#

Encapsulates all the details describing a single ‘crop’, that is, its location, name, and batch size/number. Also allows tracking of crop’s progress, and experimentally, automatic submission of workers to grid engine to complete un-grown cases. Can also be instantiated directly from a Runner or Harvester or Crop instance.

Parameters
  • fn (callable, optional) – Target function - Crop name will be inferred from this if not given explicitly. If given, Sower will also default to saving a version of fn to disk for cropping.grow to use.

  • name (str, optional) – Custom name for this set of runs - must be given if fn is not.

  • parent_dir (str, optional) – If given, alternative directory to put the “.xyz-{name}/” folder in with all the cases and results.

  • save_fn (bool, optional) – Whether to save the function to disk for cropping.grow to use. Will default to True if fn is given.

  • batchsize (int, optional) – How many cases to group into a single batch per worker. By default, batchsize=1. Cannot be specified if num_batches is.

  • num_batches (int, optional) – How many total batches to aim for, cannot be specified if batchsize is.

  • farmer ({xyzpy.Runner, xyzpy.Harvester, xyzpy.Sampler}, optional) – A Runner, Harvester or Sampler, instance, from which the fn can be inferred and which can also allow the Crop to reap itself straight to a dataset or dataframe.

  • autoload (bool, optional) – If True, check for the existence of a Crop written to disk with the same location, and if found, load it.

property all_nan_result#

Get a stand-in result for cases which are missing still.

calc_progress()[source]#

Calculate how much progressed has been made in growing the batches.

check_bad(delete_bad=True)[source]#

Check that the result dumps are not bad -> sometimes length does not match the batch. Optionally delete these so that they can be re-grown.

Parameters

delete_bad (bool) – Delete bad results as they are come across.

Returns

bad_ids – The bad batch numbers.

Return type

tuple

choose_batch_settings(*, combos=None, cases=None)[source]#

Work out how to divide all cases into batches, i.e. ensure that batchsize * num_batches >= num_cases.

delete_all()[source]#

Delete the crop directory and all its contents.

ensure_dirs_exists()[source]#

Make sure the directory structure for this crop exists.

property fn#

Function to save with the Crop for automatic loading and running. Default crop name will be inferred from this ifnot given explicitly as well.

gen_cluster_script(scheduler, batch_ids=None, *, hours=None, minutes=None, seconds=None, gigabytes=2, num_procs=1, num_threads=None, num_nodes=1, launcher='python', setup='#', shell_setup='', mpi=False, temp_gigabytes=1, output_directory=None, extra_resources=None, debugging=False)#

Generate a cluster script to grow a Crop.

Parameters
  • crop (Crop) – The crop to grow.

  • scheduler ({'sge', 'pbs', 'slurm'}) – Whether to use a SGE, PBS or slurm submission script template.

  • batch_ids (int or tuple[int]) – Which batch numbers to grow, defaults to all missing batches.

  • hours (int) – How many hours to request, default=0.

  • minutes (int, optional) – How many minutes to request, default=20.

  • seconds (int, optional) – How many seconds to request, default=0.

  • gigabytes (int, optional) – How much memory to request, default: 2.

  • num_procs (int, optional) – How many processes to request (threaded cores or MPI), default: 1.

  • launcher (str, optional) – How to launch the script, default: 'python'. But could for example be 'mpiexec python' for a MPI program.

  • setup (str, optional) – Python script to run before growing, for things that shouldnt’t be put in the crop function itself, e.g. one-time imports with side-effects like: "import tensorflow as tf; tf.enable_eager_execution()”.

  • shell_setup (str, optional) – Commands to be run by the shell before the python script is executed. E.g. conda activate my_env.

  • mpi (bool, optional) – Request MPI processes not threaded processes.

  • temp_gigabytes (int, optional) – How much temporary on-disk memory.

  • output_directory (str, optional) – What directory to write output to. Defaults to “$HOME/Scratch/output”.

  • extra_resources (str, optional) – Extra “#$ -l” resources, e.g. ‘gpu=1’

  • debugging (bool, optional) – Set the python log level to debugging.

Return type

str

gen_qsub_script(batch_ids=None, *, scheduler='sge', **kwargs)#

Generate a qsub script to grow a Crop. Deprecated in favour of gen_cluster_script and will be removed in the future.

Parameters
  • crop (Crop) – The crop to grow.

  • batch_ids (int or tuple[int]) – Which batch numbers to grow, defaults to all missing batches.

  • scheduler ({'sge', 'pbs'}, optional) – Whether to use a SGE or PBS submission script template.

  • kwargs – See gen_cluster_script for all other parameters.

grow(batch_ids, **combo_runner_opts)[source]#

Grow specific batch numbers using this process.

grow_cluster(scheduler, batch_ids=None, *, hours=None, minutes=None, seconds=None, gigabytes=2, num_procs=1, num_threads=None, num_nodes=1, launcher='python', setup='#', shell_setup='', mpi=False, temp_gigabytes=1, output_directory=None, extra_resources=None, debugging=False)#

Automagically submit SGE, PBS, or slurm jobs to grow all missing results.

Parameters
  • crop (Crop) – The crop to grow.

  • scheduler ({'sge', 'pbs', 'slurm'}) – Whether to use a SGE, PBS or slurm submission script template.

  • batch_ids (int or tuple[int]) – Which batch numbers to grow, defaults to all missing batches.

  • hours (int) – How many hours to request, default=0.

  • minutes (int, optional) – How many minutes to request, default=20.

  • seconds (int, optional) – How many seconds to request, default=0.

  • gigabytes (int, optional) – How much memory to request, default: 2.

  • num_procs (int, optional) – How many processes to request (threaded cores or MPI), default: 1.

  • launcher (str, optional) – How to launch the script, default: 'python'. But could for example be 'mpiexec python' for a MPI program.

  • setup (str, optional) – Python script to run before growing, for things that shouldnt’t be put in the crop function itself, e.g. one-time imports with side-effects like: "import tensorflow as tf; tf.enable_eager_execution()”.

  • shell_setup (str, optional) – Commands to be run by the shell before the python script is executed. E.g. conda activate my_env.

  • mpi (bool, optional) – Request MPI processes not threaded processes.

  • temp_gigabytes (int, optional) – How much temporary on-disk memory.

  • output_directory (str, optional) – What directory to write output to. Defaults to “$HOME/Scratch/output”.

  • extra_resources (str, optional) – Extra “#$ -l” resources, e.g. ‘gpu=1’

  • debugging (bool, optional) – Set the python log level to debugging.

grow_missing(**combo_runner_opts)[source]#

Grow any missing results using this process.

is_prepared()[source]#

Check whether this crop has been written to disk.

is_ready_to_reap()[source]#

Have all batches been grown?

load_function()[source]#

Load the saved function from disk, and try to re-insert it back into Harvester or Runner if present.

load_info()[source]#

Load the full settings from disk.

missing_results()[source]#

Return tuple of batches which haven’t been grown yet.

property num_sown_batches#

Total number of batches to be run/grown.

prepare(combos=None, cases=None, fn_args=None)[source]#

Write information about this crop and the supplied combos to disk. Typically done at start of sow, not when Crop instantiated.

qsub_grow(batch_ids=None, *, scheduler='sge', **kwargs)#

Automagically submit SGE or PBS jobs to grow all missing results. Deprecated in favour of grow_cluster and will be removed in the future.

Parameters
  • crop (Crop) – The crop to grow.

  • batch_ids (int or tuple[int]) – Which batch numbers to grow, defaults to all missing batches.

  • scheduler ({'sge', 'pbs'}, optional) – Whether to use a SGE or PBS submission script template.

  • kwargs – See grow_cluster for all other parameters.

reap(wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False)[source]#

Reap sown and grown combos from disk. Return a dataset if a runner or harvester is set, otherwise, the raw nested tuple.

Parameters
  • wait (bool, optional) – Whether to wait for results to appear. If false (default) all results need to be in place before the reap.

  • sync (bool, optional) – Immediately sync the new dataset with the on-disk full dataset or dataframe if a harvester or sampler is used.

  • overwrite (bool, optional) – How to compare data when syncing to on-disk dataset. If None, (default) merge as long as no conflicts. True: overwrite with the new data. False, discard any new conflicting data.

  • clean_up (bool, optional) – Whether to delete all the batch files once the results have been gathered. If left as None this will be automatically set to not allow_incomplete.

  • allow_incomplete (bool, optional) – Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan.

Return type

nested tuple or xarray.Dataset

reap_combos(wait=False, clean_up=None, allow_incomplete=False)[source]#

Reap already sown and grown results from this crop.

Parameters
  • wait (bool, optional) – Whether to wait for results to appear. If false (default) all results need to be in place before the reap.

  • clean_up (bool, optional) – Whether to delete all the batch files once the results have been gathered. If left as None this will be automatically set to not allow_incomplete.

  • allow_incomplete (bool, optional) – Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan.

Returns

results – ‘N-dimensional’ tuple containing the results.

Return type

nested tuple

reap_combos_to_ds(var_names=None, var_dims=None, var_coords=None, constants=None, attrs=None, parse=True, wait=False, clean_up=None, allow_incomplete=False, to_df=False)[source]#

Reap a function over sowed combinations and output to a Dataset.

Parameters
  • var_names (str, sequence of strings, or None) – Variable name(s) of the output(s) of fn, set to None if fn outputs data already labelled in a Dataset or DataArray.

  • var_dims (sequence of either strings or string sequences, optional) – ‘Internal’ names of dimensions for each variable, the values for each dimension should be contained as a mapping in either var_coords (not needed by fn) or constants (needed by fn).

  • var_coords (mapping, optional) – Mapping of extra coords the output variables may depend on.

  • constants (mapping, optional) – Arguments to fn which are not iterated over, these will be recorded either as attributes or coordinates if they are named in var_dims.

  • resources (mapping, optional) – Like constants but they will not be recorded.

  • attrs (mapping, optional) – Any extra attributes to store.

  • wait (bool, optional) – Whether to wait for results to appear. If false (default) all results need to be in place before the reap.

  • clean_up (bool, optional) – Whether to delete all the batch files once the results have been gathered. If left as None this will be automatically set to not allow_incomplete.

  • allow_incomplete (bool, optional) – Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan.

  • to_df (bool, optional) – Whether to reap to a xarray.Dataset or a pandas.DataFrame.

Returns

Multidimensional labelled dataset contatining all the results.

Return type

xarray.Dataset or pandas.Dataframe

reap_harvest(harvester, wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False)[source]#

Reap a Crop over sowed combos and merge with the dataset defined by a Harvester.

reap_runner(runner, wait=False, clean_up=None, allow_incomplete=False, to_df=False)[source]#

Reap a Crop over sowed combos and save to a dataset defined by a Runner.

reap_samples(sampler, wait=False, sync=True, clean_up=None, allow_incomplete=False)[source]#

Reap a Crop over sowed combos and merge with the dataframe defined by a Sampler.

save_function_to_disk()[source]#

Save the base function to disk using cloudpickle

save_info(combos=None, cases=None, fn_args=None)[source]#

Save information about the sowed cases.

sow_cases(fn_args, cases, combos=None, constants=None, verbosity=1, batchsize=None, num_batches=None)[source]#

Sow cases to disk to be later grown, potentially in batches.

Parameters
  • fn_args (iterable[str] or str) – The names and order of the function arguments, can be None if each case is supplied as a dict.

  • cases (iterable or mappings, optional) – Sequence of individual cases to sow for all or some function arguments.

  • combos (dict_like[str, iterable]) – Combinations to sow for some or all function arguments.

  • constants (mapping, optional) – Provide additional constant function values to use when sowing.

  • verbosity (int, optional) – How much information to show when sowing.

  • batchsize (int, optional) – If specified, set a new batchsize for the crop.

  • num_batches (int, optional) – If specified, set a new num_batches for the crop.

sow_combos(combos, cases=None, constants=None, shuffle=False, verbosity=1, batchsize=None, num_batches=None)[source]#

Sow combos to disk to be later grown, potentially in batches. Note if you have already sown this Crop, as long as the number of batches hasn’t changed (e.g. you have just tweaked the function or a constant argument), you can safely resow and only the batches will be overwritten, i.e. the results will remain.

Parameters
  • combos (dict_like[str, iterable]) – The combinations to sow for all or some function arguments.

  • cases (iterable or mappings, optional) – Optionally provide an sequence of individual cases to sow for some or all function arguments.

  • constants (mapping, optional) – Provide additional constant function values to use when sowing.

  • shuffle (bool or int, optional) – If given, sow the combos in a random order (using random.seed and random.shuffle), which can be helpful for distributing resources when not all cases are computationally equal.

  • verbosity (int, optional) – How much information to show when sowing.

  • batchsize (int, optional) – If specified, set a new batchsize for the crop.

  • num_batches (int, optional) – If specified, set a new num_batches for the crop.

sow_samples(n, combos=None, constants=None, verbosity=1)[source]#

Sow n samples to disk.

class xyzpy.Harvester(runner, data_name=None, chunks=None, engine='h5netcdf', full_ds=None)[source]#

Container class for collecting and aggregating data to disk.

Parameters
  • runner (Runner) – Performs the runs and describes the results.

  • data_name (str, optional) – Base file path to save data to.

  • chunks (int or dict, optional) – If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.

  • engine (str, optional) – Engine to use to save and load datasets.

  • full_ds (xarray.Dataset) – Initialize the Harvester with this dataset as the intitial full dataset.

  • Members

  • -------

  • full_ds – Dataset containing all data harvested so far, by default synced to disk.

  • last_ds (xarray.Dataset) – Dataset containing just the data from the last harvesting run.

Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)[source]#

Return a Crop instance with this Harvester, from which fn will be set, and then combos can be sown, grown, and reaped into the Harvester.full_ds. See Crop.

Return type

Crop

add_ds(new_ds, sync=True, overwrite=None, chunks=None, engine=None)[source]#

Merge a new dataset into the in-memory full dataset.

Parameters
  • new_ds (xr.Dataset or xr.DataArray) – Data to be merged into the full dataset.

  • sync (bool, optional) – If True (default), load and save the disk dataset before and after merging in the new data.

  • overwrite ({None, False, True}, optional) –

    How to combine data from the new run into the current full_ds:

    • None (default): attempt the merge and only raise if data conflicts.

    • True: overwrite conflicting current data with that from the new dataset.

    • False: drop any conflicting data from the new dataset.

  • chunks (int or dict, optional) – If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.

  • engine (str, optional) – Engine to use to save and load datasets.

delete_ds(backup=False)[source]#

Delete the on-disk dataset, optionally backing it up first.

drop_sel(labels=None, *, errors='raise', engine=None, **labels_kwargs)[source]#

Drop specific values of coordinates from this harvester and its dataset. See http://xarray.pydata.org/en/latest/generated/xarray.Dataset.drop_sel.html. The change is immediately synced with the on-disk dataset. Useful for tidying uneeded data points.

expand_dims(name, value, engine=None)[source]#

Add a new coordinate dimension with name and value. The change is immediately synced with the on-disk dataset. Useful if you want to expand the parameter space along a previously constant argument.

property full_ds#

Dataset containing all saved runs.

harvest_cases(cases, *, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings)[source]#

Run cases, automatically merging into an on-disk dataset.

Parameters
  • cases (list of dict or tuple) – The cases to run.

  • sync (bool, optional) – If True (default), load and save the disk dataset before and after merging in the new data.

  • overwrite ({None, False, True}, optional) –

    What to do regarding clashes with old data:

    • None (default): attempt the merge and only raise if data conflicts.

    • True: overwrite conflicting current data with that from the new dataset.

    • False: drop any conflicting data from the new dataset.

  • chunks (bool, optional) – If not None, passed passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.

  • engine (str, optional) – Engine to use to save and load datasets.

  • runner_settings – Supplied to case_runner().

harvest_combos(combos, *, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings)[source]#

Run combos, automatically merging into an on-disk dataset.

Parameters
  • combos (dict_like[str, iterable]) – The combos to run. The only difference here is that you can supply an ellipse ..., meaning the all values for that coordinate will be loaded from the current full dataset.

  • sync (bool, optional) – If True (default), load and save the disk dataset before and after merging in the new data.

  • overwrite ({None, False, True}, optional) –

    • None (default): attempt the merge and only raise if data conflicts.

    • True: overwrite conflicting current data with that from the new dataset.

    • False: drop any conflicting data from the new dataset.

  • chunks (bool, optional) – If not None, passed passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.

  • engine (str, optional) – Engine to use to save and load datasets.

  • runner_settings – Supplied to combo_runner().

property last_ds#

Dataset containing the last runs’ data.

load_full_ds(chunks=None, engine=None)[source]#

Load the disk dataset into full_ds.

Parameters
  • chunks (int or dict, optional) – If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays.

  • engine (str, optional) – Engine to use to save and load datasets.

save_full_ds(new_full_ds=None, engine=None)[source]#

Save full_ds onto disk.

Parameters
  • new_full_ds (xarray.Dataset, optional) – Save this dataset as the new full dataset, else use the current full datset.

  • engine (str, optional) – Engine to use to save and load datasets.

class xyzpy.LinePlot(ds, x, y, z=None, *, y_err=None, x_err=None, **kwargs)[source]#
plot_lines()[source]#
class xyzpy.Runner(fn, var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, **default_runner_settings)[source]#

Container class with all the information needed to systematically run a function over many parameters and capture the output in a dataset.

Parameters
  • fn (callable) – Function that produces a single instance of a result.

  • var_names (str, sequence of str, or None) – The ordered name(s) of the ouput variable(s) of fn. Set this explicitly to None if fn outputs already labelled data as a Dataset or DataArray.

  • fn_args (str, or sequence of str, optional) – The ordered name(s) of the input arguments(s) of fn. This is only needed if the cases or combos supplied are not dict-like.

  • var_dims (dict-like, optional) – Mapping of output variables to their named internal dimensions, can be the names of constants.

  • var_coords (dict-like, optional) – Mapping of output variables named internal dimensions to the actual values they take.

  • constants (dict-like, optional) – Constants arguments to be supplied to fn. These can be used as ‘var_dims’, and will be saved as coords if so, otherwise as attributes.

  • resources (dict-like, optional) – Like constants but not saved to the the dataset, e.g. if very big.

  • attrs (dict-like, optional) – Any other miscelleous information to be saved with the dataset.

  • default_runner_settings – These keyword arguments will be supplied as defaults to any runner.

Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)[source]#

Return a Crop instance with this runner, from which fn will be set, and then combos can be sown, grown, and reaped into the Runner.last_ds. See Crop.

Return type

Crop

property constants#

Mapping of constant arguments supplied to the Runner’s function.

property fn_args#

List of the names of the arguments that the Runner’s function takes.

property resources#

Mapping of constant arguments supplied to the Runner’s function that are not saved with the dataset.

run_cases(cases, constants=(), fn_args=None, **runner_settings)[source]#

Run cases using the function and save to dataset.

Parameters
  • cases (sequence of mappings or tuples) – A sequence of cases.

  • constants (dict (optional)) – Extra constant arguments for this run, repeated arguments will take precedence over stored constants but for this run only.

  • runner_settings – Supplied to case_runner().

run_combos(combos, constants=(), **runner_settings)[source]#

Run combos using the function map and save to dataset.

Parameters
  • combos (dict_like[str, iterable]) – The values of each function argument with which to evaluate all combinations.

  • constants (dict, optional) – Extra constant arguments for this run, repeated arguments will take precedence over stored constants but for this run only.

  • runner_settings – Keyword arguments supplied to combo_runner().

property var_coords#

Mapping of each variable named dimension to its coordinate values.

property var_dims#

Mapping of each output variable to its named dimensions

property var_names#

List of the names of the variables that the Runner’s function produces.

class xyzpy.RunningCovariance[source]#

Running covariance class.

property covar#

The covariance.

property sample_covar#

The covariance with “Bessel’s correction”.

class xyzpy.RunningStatistics[source]#

Running mean & standard deviation using Welford’s algorithm. This is a very efficient way of keeping track of the error on the mean for example.

mean#

Current mean.

Type

float

count#

Current count.

Type

int

std#

Current standard deviation.

Type

float

var#

Current variance.

Type

float

err#

Current error on the mean.

Type

float

rel_err#

The current relative error.

Type

float

Examples

>>> rs = RunningStatistics()
>>> rs.update(1.1)
>>> rs.update(1.4)
>>> rs.update(1.2)
>>> rs.update_from_it([1.5, 1.3, 1.6])
>>> rs.mean
1.3499999046325684
>>> rs.std  # standard deviation
0.17078252585383266
>>> rs.err  # error on the mean
0.06972167422092768
converged(rtol, atol)[source]#

Check if the stats have converged with respect to relative and absolute tolerance rtol and atol.

update(x)[source]#

Add a single value x to the statistics.

update_from_it(xs)[source]#

Add all values from iterable xs to the statistics.

class xyzpy.Sampler(runner, data_name=None, default_combos=None, full_df=None, engine='pickle')[source]#

Like a Harvester, but randomly samples combos and writes the table of results to a pandas.DataFrame.

Parameters
  • runner (xyzpy.Runner) – Runner describing a labelled function to run.

  • data_name (str, optional) – If given, the on-disk file to sync results with.

  • default_combos (dict_like[str, iterable], optional) – The default combos to sample from (which can be overridden).

  • full_df (pandas.DataFrame, optional) – If given, use this dataframe as the initial ‘full’ data.

  • engine ({'pickle', 'csv', 'json', 'hdf', ...}, optional) – How to save and load the on-disk dataframe. See load_df() and save_df().

full_df#

Dataframe describing all data harvested so far.

Type

pandas.DataFrame

last_df#

Dataframe describing the data harvested on the previous run.

Type

pandas.Dataframe

Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None)[source]#

Return a Crop instance with this Sampler, from which fn will be set, and then samples can be sown, grown, and reaped into the Sampler.full_df. See Crop.

Return type

Crop

add_df(new_df, sync=True, engine=None)[source]#

Merge a new dataset into the in-memory full dataset.

Parameters
  • new_df (pandas.DataFrame or dict) – Data to be appended to the full dataset.

  • sync (bool, optional) – If True (default), load and save the disk dataframe before and after merging in the new data.

  • engine (str, optional) – Which engine to save the dataframe with.

delete_df(backup=False)[source]#

Delete the on-disk dataframe, optionally backing it up first.

property full_df#

The dataframe describing all data harvested so far.

gen_cases_fnargs(n, combos=None)[source]#
property last_df#

The dataframe describing the last set of data harvested.

load_full_df(engine=None)[source]#

Load the on-disk full dataframe into memory.

sample_combos(n, combos=None, engine=None, **case_runner_settings)[source]#

Sample the target function many times, randomly choosing parameter combinations from combos (or SampleHarvester.default_combos).

Parameters
  • n (int) – How many samples to run.

  • combos (dict_like[str, iterable], optional) – A mapping of function arguments to potential choices. Any keys in here will override default_combos. You can also suppply a callable to manually return a random choice e.g. from a probability distribution.

  • engine (str, optional) – Which method to use to sync with the on-disk dataframe.

  • case_runner_settings – Supplied to case_runner() and so onto combo_runner(). This includes parallel=True etc.

save_full_df(new_full_df=None, engine=None)[source]#

Save full_df onto disk.

Parameters
  • new_full_df (pandas.DataFrame, optional) – Save this dataframe as the new full dataframe, else use the current full_df.

  • engine (str, optional) – Which engine to save the dataframe with, if None use the default.

class xyzpy.Timer[source]#

A very simple context manager class for timing blocks.

Examples

>>> from xyzpy import Timer
>>> with Timer() as timer:
...     print('Doing some work!')
...
Doing some work!
>>> timer.t
0.00010752677917480469
xyzpy.auto_heatmap(x, **heatmap_opts)[source]#

Auto version of heatmap() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_histogram(x, **histogram_opts)[source]#

Auto version of histogram() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_iheatmap(x, **iheatmap_opts)[source]#

Auto version of iheatmap() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_ilineplot(x, y_z, **lineplot_opts)[source]#

Auto version of ilineplot() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_iscatter(x, y_z, **iscatter_opts)[source]#

Auto version of iscatter() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_lineplot(x, y_z, **lineplot_opts)[source]#

Auto version of lineplot() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_scatter(x, y_z, **scatter_opts)[source]#

Auto version of scatter() that accepts array arguments by converting them to a Dataset first.

xyzpy.auto_xyz_ds(x, y_z=None)[source]#

Automatically turn an array into a xarray dataset. Transpose y_z if necessary to automatically match dimension sizes.

Parameters
  • x (array_like) – The x-coordinates.

  • y_z (array_like, optional) – The y-data, possibly varying with coordinate z.

xyzpy.benchmark(fn, setup=None, n=None, min_t=0.1, repeats=3, get='min', starmap=False)[source]#

Benchmark the time it takes to run fn.

Parameters
  • fn (callable) – The function to time.

  • setup (callable, optional) – If supplied the function that sets up the argument for fn.

  • n (int, optional) – If supplied, the integer to supply to setup of fn.

  • min_t (float, optional) – Aim to repeat function enough times to take up this many seconds.

  • repeats (int, optional) – Repeat the whole procedure (with setup) this many times in order to take the minimum run time.

  • get ({'min', 'mean'}, optional) – Return the minimum or mean time for each run.

  • starmap (bool, optional) – Unpack the arguments from setup, if given.

Returns

t – The minimum, averaged, time to run fn in seconds.

Return type

float

Examples

Just a parameter-less function:

>>> import xyzpy as xyz
>>> import numpy as np
>>> xyz.benchmark(lambda: np.linalg.eig(np.random.randn(100, 100)))
0.004726233000837965

The same but with a setup and size parameter n specified:

>>> setup = lambda n: np.random.randn(n, n)
>>> fn = lambda X: np.linalg.eig(X)
>>> xyz.benchmark(fn, setup, 100)
0.0042192734545096755
xyzpy.cache_to_disk(fn=None, *, cachedir='__xyz_cache__', **kwargs)[source]#

Cache this function to disk, using joblib.

xyzpy.case_runner(fn, fn_args, cases, combos=None, constants=None, split=False, shuffle=False, parse=True, parallel=False, executor=None, num_workers=None, verbosity=1)[source]#

Simple case runner that outputs the raw tuple of results.

Parameters
  • fn (callable) – Function with which to evalute cases with

  • fn_args (tuple) – Names of case arguments that fn takes, can be None if each case is a dict.

  • cases (iterable[tuple] or iterable[dict]) – List of specific configurations that fn_args should take. If fn_args is None, each case should be a dict.

  • combos (dict_like[str, iterable], optional) – Optional specification of sub-combinations.

  • constants (dict, optional) – Constant function arguments.

  • split (bool, optional) – See combo_runner().

  • shuffle (bool or int, optional) – If given, compute the results in a random order (using random.seed and random.shuffle), which can be helpful for distributing resources when not all cases are computationally equal.

  • parallel (bool, optional) – Process combos in parallel, default number of workers picked.

  • executor (executor-like pool, optional) – Submit all combos to this pool executor. Must have submit or apply_async methods and API matching either concurrent.futures or an ipyparallel view. Pools from multiprocessing.pool are also supported.

  • num_workers (int, optional) – Explicitly choose how many workers to use, None for automatic.

  • verbosity ({0, 1, 2}, optional) –

    How much information to display:

    • 0: nothing,

    • 1: just progress,

    • 2: all information.

Returns

results

Return type

list of fn output for each case

xyzpy.case_runner_to_df(fn, fn_args, cases, var_names, var_dims=None, var_coords=None, combos=None, constants=None, resources=None, attrs=None, shuffle=False, *, to_df=True, parse=True, parallel=False, num_workers=None, executor=None, verbosity=1)#

Takes a list of cases to run fn over, possibly in parallel, and outputs a xarray.Dataset.

Parameters
  • fn (callable) – Function to evaluate.

  • fn_args (str or iterable[str]) – Names and order of arguments to fn, can be None if cases are supplied as dicts.

  • cases (iterable[tuple] or iterable[dict]) – List of configurations used to generate results.

  • var_names (str or iterable of str) – Variable name(s) of the output(s) of fn.

  • var_dims (sequence of either strings or string sequences, optional) – ‘Internal’ names of dimensions for each variable, the values for each dimension should be contained as a mapping in either var_coords (not needed by fn) or constants (needed by fn).

  • var_coords (mapping, optional) – Mapping of extra coords the output variables may depend on.

  • combos (dict_like[str, iterable], optional) – If specified, run all combinations of some arguments in these mappings.

  • constants (mapping, optional) – Arguments to fn which are not iterated over, these will be recorded either as attributes or coordinates if they are named in var_dims.

  • resources (mapping, optional) – Like constants but they will not be recorded.

  • attrs (mapping, optional) – Any extra attributes to store.

  • shuffle (bool or int, optional) – If given, compute the results in a random order (using random.seed and random.shuffle), which can be helpful for distributing resources when not all cases are computationally equal.

  • parse (bool, optional) – Whether to perform parsing of the inputs arguments.

  • parallel (bool, optional) – Process combos in parallel, default number of workers picked.

  • executor (executor-like pool, optional) – Submit all combos to this pool executor. Must have submit or apply_async methods and API matching either concurrent.futures or an ipyparallel view. Pools from multiprocessing.pool are also supported.

  • num_workers (int, optional) – Explicitly choose how many workers to use, None for automatic.

  • verbosity ({0, 1, 2}, optional) –

    How much information to display:

    • 0: nothing,

    • 1: just progress,

    • 2: all information.

Returns

ds – Dataset with minimal covering coordinates and all cases evaluated.

Return type

xarray.Dataset

xyzpy.case_runner_to_ds(fn, fn_args, cases, var_names, var_dims=None, var_coords=None, combos=None, constants=None, resources=None, attrs=None, shuffle=False, to_df=False, parse=True, parallel=False, num_workers=None, executor=None, verbosity=1)[source]#

Takes a list of cases to run fn over, possibly in parallel, and outputs a xarray.Dataset.

Parameters
  • fn (callable) – Function to evaluate.

  • fn_args (str or iterable[str]) – Names and order of arguments to fn, can be None if cases are supplied as dicts.

  • cases (iterable[tuple] or iterable[dict]) – List of configurations used to generate results.

  • var_names (str or iterable of str) – Variable name(s) of the output(s) of fn.

  • var_dims (sequence of either strings or string sequences, optional) – ‘Internal’ names of dimensions for each variable, the values for each dimension should be contained as a mapping in either var_coords (not needed by fn) or constants (needed by fn).

  • var_coords (mapping, optional) – Mapping of extra coords the output variables may depend on.

  • combos (dict_like[str, iterable], optional) – If specified, run all combinations of some arguments in these mappings.

  • constants (mapping, optional) – Arguments to fn which are not iterated over, these will be recorded either as attributes or coordinates if they are named in var_dims.

  • resources (mapping, optional) – Like constants but they will not be recorded.

  • attrs (mapping, optional) – Any extra attributes to store.

  • shuffle (bool or int, optional) – If given, compute the results in a random order (using random.seed and random.shuffle), which can be helpful for distributing resources when not all cases are computationally equal.

  • parse (bool, optional) – Whether to perform parsing of the inputs arguments.

  • parallel (bool, optional) – Process combos in parallel, default number of workers picked.

  • executor (executor-like pool, optional) – Submit all combos to this pool executor. Must have submit or apply_async methods and API matching either concurrent.futures or an ipyparallel view. Pools from multiprocessing.pool are also supported.

  • num_workers (int, optional) – Explicitly choose how many workers to use, None for automatic.

  • verbosity ({0, 1, 2}, optional) –

    How much information to display:

    • 0: nothing,

    • 1: just progress,

    • 2: all information.

Returns

ds – Dataset with minimal covering coordinates and all cases evaluated.

Return type

xarray.Dataset

xyzpy.check_runs(obj, dim='run', var=None, sel=())[source]#

Print out information about the range and any missing values for an integer dimension.

Parameters
  • obj (xarray object) – Data to check.

  • dim (str (optional)) – Dimension to check, defaults to ‘run’.

  • var (str (optional)) – Subselect this data variable first.

  • sel (mapping (optional)) – Subselect these other coordinates first.

xyzpy.cimple(hue, sat1=0.4, sat2=1.0, val1=0.95, val2=0.25, hue_shift=0.0, name='cimple', auto_adjust_sat=0.2)[source]#

Creates a color map for a single hue.

xyzpy.cimple_bright(hue, sat1=0.8, sat2=0.9, val1=0.97, val2=0.3, hue_shift=0.0, name='cimple_bright')[source]#

Creates a color map for a single hue, with bright defaults.

xyzpy.combo_runner(fn, combos=None, *, cases=None, constants=None, split=False, flat=False, shuffle=False, parallel=False, executor=None, num_workers=None, verbosity=1)[source]#

Take a function fn and compute it over all combinations of named variables values, optionally showing progress and in parallel.

Parameters
  • fn (callable) – Function to analyse.

  • combos (dict_like[str, iterable]) – All combinations of each argument to values mapping will be computed. Each argument range thus gets a dimension in the output array(s).

  • cases (sequence of mappings, optional) – Optional list of specific configurations. If both combos and cases are given, then the function is computed for all sub-combinations in combos for each case in cases, arguments can thus only appear in one or the other. Note that missing combinations of arguments will be represented by nan if creating a nested array.

  • constants (dict, optional) – Constant function arguments. Unlike combos and cases, these won’t produce dimensions in the output result when flat=False.

  • split (bool, optional) – Whether to split (unzip) the outputs of fn into multiple output arrays or not.

  • flat (bool, optional) – Whether to return a flat list of results or to return a nested tuple suitable to be supplied to numpy.array.

  • shuffle (bool or int, optional) – If given, compute the results in a random order (using random.seed and random.shuffle), which can be helpful for distributing resources when not all cases are computationally equal.

  • parallel (bool, optional) – Process combos in parallel, default number of workers picked.

  • executor (executor-like pool, optional) – Submit all combos to this pool executor. Must have submit or apply_async methods and API matching either concurrent.futures or an ipyparallel view. Pools from multiprocessing.pool are also supported.

  • num_workers (int, optional) – Explicitly choose how many workers to use, None for automatic.

  • verbosity ({0, 1, 2}, optional) –

    How much information to display:

    • 0: nothing,

    • 1: just progress,

    • 2: all information.

Returns

data – Nested tuple containing all combinations of running fn if flat == False else a flat list of results.

Return type

nested tuple

Examples

>>> def fn(a, b, c, d):
...     return str(a) + str(b) + str(c) + str(d)

Run all possible combos:

>>> xyz.combo_runner(
...     fn,
...     combos={
...         'a': [1, 2],
...         'b': [3, 4],
...         'c': [5, 6],
...         'd': [7, 8],
...     },
... )
100%|##########| 16/16 [00:00<00:00, 84733.41it/s]

(((('1357', '1358'), ('1367', '1368')),
  (('1457', '1458'), ('1467', '1468'))),
 ((('2357', '2358'), ('2367', '2368')),
  (('2457', '2458'), ('2467', '2468'))))

Run only a selection of cases:

>>> xyz.combo_runner(
...     fn,
...     cases=[
...         {'a': 1, 'b': 3, 'c': 5, 'd': 7},
...         {'a': 2, 'b': 4, 'c': 6, 'd': 8},
...     ],
... )
100%|##########| 2/2 [00:00<00:00, 31418.01it/s]
(((('1357', nan), (nan, nan)),
  ((nan, nan), (nan, nan))),
 (((nan, nan), (nan, nan)),
  ((nan, nan), (nan, '2468'))))

Run only certain cases of some args, but all combinations of others:

>>> xyz.combo_runner(
...     fn,
...     cases=[
...         {'a': 1, 'b': 3},
...         {'a': 2, 'b': 4},
...     ],
...     combos={
...         'c': [3, 4],
...         'd': [4, 5],
...     },
... )
100%|##########| 8/8 [00:00<00:00, 92691.80it/s]
(((('1334', '1335'), ('1344', '1345')),
  ((nan, nan), (nan, nan))),
 (((nan, nan), (nan, nan)),
  (('2434', '2435'), ('2444', '2445'))))
xyzpy.combo_runner_to_df(fn, combos, var_names, *, var_dims=None, var_coords=None, cases=None, constants=None, resources=None, attrs=None, shuffle=False, parse=True, to_df=True, parallel=False, num_workers=None, executor=None, verbosity=1)#

Evaluate a function over all cases and combinations and output to a xarray.Dataset.

Parameters
  • fn (callable) – Function to evaluate.

  • combos (dict_like[str, iterable]) – Mapping of each individual function argument to sequence of values.

  • var_names (str, sequence of strings, or None) – Variable name(s) of the output(s) of fn, set to None if fn outputs data already labelled in a Dataset or DataArray.

  • var_dims (sequence of either strings or string sequences, optional) – ‘Internal’ names of dimensions for each variable, the values for each dimension should be contained as a mapping in either var_coords (not needed by fn) or constants (needed by fn).

  • var_coords (mapping, optional) – Mapping of extra coords the output variables may depend on.

  • cases (sequence of dicts, optional) – Individual cases to run for some or all function arguments.

  • constants (mapping, optional) – Arguments to fn which are not iterated over, these will be recorded either as attributes or coordinates if they are named in var_dims.

  • resources (mapping, optional) – Like constants but they will not be recorded.

  • attrs (mapping, optional) – Any extra attributes to store.

  • parallel (bool, optional) – Process combos in parallel, default number of workers picked.

  • executor (executor-like pool, optional) – Submit all combos to this pool executor. Must have submit or apply_async methods and API matching either concurrent.futures or an ipyparallel view. Pools from multiprocessing.pool are also supported.

  • num_workers (int, optional) – Explicitly choose how many workers to use, None for automatic.

  • verbosity ({0, 1, 2}, optional) –

    How much information to display:

    • 0: nothing,

    • 1: just progress,

    • 2: all information.

Returns

ds – Multidimensional labelled dataset contatining all the results if to_df=False (the default), else a pandas dataframe with results as labelled rows.

Return type

xarray.Dataset or pandas.DataFrame

xyzpy.combo_runner_to_ds(fn, combos, var_names, *, var_dims=None, var_coords=None, cases=None, constants=None, resources=None, attrs=None, shuffle=False, parse=True, to_df=False, parallel=False, num_workers=None, executor=None, verbosity=1)[source]#

Evaluate a function over all cases and combinations and output to a xarray.Dataset.

Parameters
  • fn (callable) – Function to evaluate.

  • combos (dict_like[str, iterable]) – Mapping of each individual function argument to sequence of values.

  • var_names (str, sequence of strings, or None) – Variable name(s) of the output(s) of fn, set to None if fn outputs data already labelled in a Dataset or DataArray.

  • var_dims (sequence of either strings or string sequences, optional) – ‘Internal’ names of dimensions for each variable, the values for each dimension should be contained as a mapping in either var_coords (not needed by fn) or constants (needed by fn).

  • var_coords (mapping, optional) – Mapping of extra coords the output variables may depend on.

  • cases (sequence of dicts, optional) – Individual cases to run for some or all function arguments.

  • constants (mapping, optional) – Arguments to fn which are not iterated over, these will be recorded either as attributes or coordinates if they are named in var_dims.

  • resources (mapping, optional) – Like constants but they will not be recorded.

  • attrs (mapping, optional) – Any extra attributes to store.

  • parallel (bool, optional) – Process combos in parallel, default number of workers picked.

  • executor (executor-like pool, optional) – Submit all combos to this pool executor. Must have submit or apply_async methods and API matching either concurrent.futures or an ipyparallel view. Pools from multiprocessing.pool are also supported.

  • num_workers (int, optional) – Explicitly choose how many workers to use, None for automatic.

  • verbosity ({0, 1, 2}, optional) –

    How much information to display:

    • 0: nothing,

    • 1: just progress,

    • 2: all information.

Returns

ds – Multidimensional labelled dataset contatining all the results if to_df=False (the default), else a pandas dataframe with results as labelled rows.

Return type

xarray.Dataset or pandas.DataFrame

xyzpy.convert_colors(cols, outformat, informat='MATPLOTLIB')[source]#

Convert lists of colors between formats

xyzpy.estimate_from_repeats(fn, *fn_args, rtol=0.02, tol_scale=1.0, get='stats', verbosity=0, min_samples=5, max_samples=1000000, **fn_kwargs)[source]#
Parameters
  • fn (callable) – The function that estimates a single value.

  • fn_args – Supplied to fn.

  • optional – Supplied to fn.

  • rtol (float, optional) – Relative tolerance for error on mean.

  • tol_scale (float, optional) – The expected ‘scale’ of the estimate, this modifies the aboslute tolerance near zero to rtol * tol_scale, default: 1.0.

  • get ({'stats', 'samples', 'mean'}, optional) – Just get the RunningStatistics object, or the actual samples too, or just the actual mean estimate.

  • verbosity ({ 0, 1, 2}, optional) –

    How much information to show:

    • 0: nothing

    • 1: progress bar just with iteration rate,

    • 2: progress bar with running stats displayed.

  • min_samples (int, optional) – Take at least this many samples before checking for convergence.

  • max_samples (int, optional) – Take at maximum this many samples.

  • fn_kwargs – Supplied to fn.

  • optional – Supplied to fn.

Returns

  • rs (RunningStatistics) – Statistics about the random estimation.

  • samples (list[float]) – If get=='samples', the actual samples.

Examples

Estimate the sum of n random numbers:

>>> import numpy as np
>>> import xyzpy as xyz
>>> def fn(n):
...     return np.random.rand(n).sum()
...
>>> stats = xyz.estimate_from_repeats(fn, n=10, verbosity=3)
59: 5.13(12): : 58it [00:00, 3610.84it/s]
RunningStatistics(mean=5.13(12), count=59)
xyzpy.find_missing_cases(ds, ignore_dims=None, method='isnull', show_progbar=False)[source]#

Find all cases in a dataset with missing data.

Parameters
  • ds (xarray.Dataset) – Dataset in which to find missing data

  • ignore_dims (set (optional)) – internal variable dimensions (i.e. to ignore)

  • show_progbar (bool (optional)) – Show the current progress

Returns

Function arguments and missing cases.

Return type

missing_fn_args, missing_cases

xyzpy.format_number_with_error(x, err)[source]#

Given x with error err, format a string showing the relevant digits of x with two significant digits of the error bracketed, and overall exponent if necessary.

Parameters
  • x (float) – The value to print.

  • err (float) – The error on x.

Return type

str

Examples

>>> print_number_with_uncertainty(0.1542412, 0.0626653)
'0.154(63)'
>>> print_number_with_uncertainty(-128124123097, 6424)
'-1.281241231(64)e+11'
xyzpy.getsizeof(obj)[source]#

Compute the real size of a python object. Taken from

https://stackoverflow.com/a/30316760/5640201

xyzpy.grow(batch_number, crop=None, fn=None, check_mpi=True, verbosity=2, debugging=False)[source]#

Automatically process a batch of cases into results. Should be run in an “.xyz-{fn_name}” folder.

Parameters
  • batch_number (int) – Which batch to ‘grow’ into a set of results.

  • crop (xyzpy.Crop) – Description of where and how to store the cases and results.

  • fn (callable, optional) – If specified, the function used to generate the results, otherwise the function will be loaded from disk.

  • check_mpi (bool, optional) – Whether to check if the process is rank 0 and only save results if so - allows mpi functions to be simply used. Defaults to true, this should only be turned off if e.g. a pool of workers is being used to run different grow instances.

  • verbosity ({0, 1, 2}, optional) – How much information to show.

  • debugging (bool, optional) – Set logging level to DEBUG.

xyzpy.heatmap(ds, x, y, z, **kwargs)[source]#

From ds plot variable z as a function of x and y using a 2D heatmap.

Parameters
  • ds (xarray.Dataset) – Dataset to plot from.

  • x (str) – Dimension to plot along the x-axis.

  • y (str) – Dimension to plot along the y-axis.

  • z (str, optional) – Variable to plot as colormap.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.histogram(ds, x, z=None, **plot_opts)[source]#

Dataset histogram.

Parameters
  • ds (xarray.Dataset) – The dataset to plot.

  • x (str, sequence of str) – The variable(s) to plot the probability density of. If sequence, plot a histogram of each instead of using a z coordinate.

  • z (str, optional) – If given, range over this coordinate a plot a histogram for each.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.iheatmap(ds, x, y, z, **kwargs)[source]#

From ds plot variable z as a function of x and y using a 2D heatmap. Interactive,

Parameters
  • ds (xarray.Dataset) – Dataset to plot from.

  • x (str) – Dimension to plot along the x-axis.

  • y (str) – Dimension to plot along the y-axis.

  • z (str, optional) – Variable to plot as colormap.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.ilineplot(ds, x, y, z=None, y_err=None, x_err=None, **kwargs)[source]#

From ds plot lines of y as a function of x, optionally for varying z. Interactive,

Parameters
  • ds (xarray.Dataset) – Dataset to plot from.

  • x (str) – Dimension to plot along the x-axis.

  • y (str or tuple[str]) – Variable(s) to plot along the y-axis. If tuple, plot each of the variables - instead of z.

  • z (str, optional) – Dimension to plot into the page.

  • y_err (str, optional) – Variable to plot as y-error.

  • x_err (str, optional) – Variable to plot as x-error.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.is_case_missing(ds, setting, method='isnull')[source]#

Does the dataset or dataarray ds not contain any non-null data for location setting?

Note that this only returns true if all data across all variables is completely missing at the location.

Parameters
Returns

missing

Return type

bool

xyzpy.iscatter(ds, x, y, z=None, y_err=None, x_err=None, **kwargs)[source]#

From ds plot a scatter of y against x, optionally for varying z. Interactive.

Parameters
  • ds (xarray.Dataset) – Dataset to plot from.

  • x (str) – Quantity to plot along the x-axis.

  • y (str or tuple[str]) – Quantity(s) to plot along the y-axis. If tuple, plot each of the variables - instead of z.

  • z (str, optional) – Dimension to plot into the page.

  • y_err (str, optional) – Variable to plot as y-error.

  • x_err (str, optional) – Variable to plot as x-error.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.label(var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, harvester=False, sampler=False, **default_runner_settings)[source]#

Convenient decorator to automatically wrap a function as a Runner or Harvester.

Parameters
  • var_names (str, sequence of str, or None) – The ordered name(s) of the ouput variable(s) of fn. Set this explicitly to None if fn outputs already labelled data as a Dataset or DataArray.

  • fn_args (str, or sequence of str, optional) – The ordered name(s) of the input arguments(s) of fn. This is only needed if the cases or combos supplied are not dict-like.

  • var_dims (dict-like, optional) – Mapping of output variables to their named internal dimensions, can be the names of constants.

  • var_coords (dict-like, optional) – Mapping of output variables named internal dimensions to the actual values they take.

  • constants (dict-like, optional) – Constants arguments to be supplied to fn. These can be used as ‘var_dims’, and will be saved as coords if so, otherwise as attributes.

  • resources (dict-like, optional) – Like constants but not saved to the the dataset, e.g. if very big.

  • attrs (dict-like, optional) – Any other miscelleous information to be saved with the dataset.

  • harvester (bool or str, optional) – If True, wrap the runner as a Harvester, if a string, create the harvester with that as the data_name.

  • default_runner_settings – These keyword arguments will be supplied as defaults to any runner.

Examples

Declare a function as a runner directly:

>>> import xyzpy as xyz

>>> @xyz.label(var_names=['sum', 'diff'])
... def foo(x, y):
...     return x + y, x - y
...

>>> foo
<xyzpy.Runner>
    fn: <function foo at 0x7f1fd8e5b1e0>
    fn_args: ('x', 'y')
    var_names: ('sum', 'diff')
    var_dims: {'sum': (), 'diff': ()}

>>> foo(1, 2)  # can still call it normally
(3, -1)
xyzpy.lineplot(ds, x, y, z=None, y_err=None, x_err=None, **plot_opts)[source]#

From ds plot lines of y as a function of x, optionally for varying z.

Parameters
  • ds (xarray.Dataset) – Dataset to plot from.

  • x (str) – Dimension to plot along the x-axis.

  • y (str or tuple[str]) – Variable(s) to plot along the y-axis. If tuple, plot each of the variables - instead of z.

  • z (str, optional) – Dimension to plot into the page.

  • y_err (str, optional) – Variable to plot as y-error.

  • x_err (str, optional) – Variable to plot as x-error.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.load_crops(directory='.')[source]#

Automatically load all the crops found in the current directory.

Parameters

directory (str, optional) – Which directory to load the crops from, defaults to ‘.’ - the current.

Returns

Mapping of the crop name to the Crop.

Return type

dict[str, Crop]

xyzpy.load_df(name, engine='pickle', key='df', **kwargs)[source]#

Load a dataframe from disk.

xyzpy.load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs)[source]#

Loads a xarray dataset. Basically xarray.open_dataset with some different defaults and convenient behaviour.

Parameters
  • file_name (str) – Name of file to open.

  • engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to load file.

  • load_to_mem (bool, optional) – Ince opened, load from disk into memory. Defaults to True if chunks=None.

  • create_new (bool, optional) – If no file exists make a blank one.

  • chunks (int or dict) – Passed to xarray.open_dataset so that data is stored using dask.array.

Returns

ds – Loaded Dataset.

Return type

xarray.Dataset

xyzpy.merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False)[source]#

Glob files based on base_name, merge them, save this new dataset if it contains new info, then clean up the conflicts.

Parameters
  • base_name (str) – Base file name to glob on - should include ‘*’.

  • engine (str , optional) – Load and save engine used by xarray.

xyzpy.parse_into_cases(combos=None, cases=None, ds=None, method='isnull')[source]#

Convert maybe combos and maybe cases to a single list of cases only, also optionally filtering based on whether any data at each location is already present in Dataset or DataArray ds.

Parameters
  • combos (dict_like[str, iterable], optional) – Parameter combinations.

  • cases (iterable[dict], optional) – Parameter configurations.

  • ds (xarray.Dataset or xarray.DataArray, optional) – Dataset or DataArray in which to check for existing data.

Returns

new_cases – The combined and possibly filtered list of cases.

Return type

iterable[dict]

xyzpy.progbar(it=None, nb=False, **kwargs)[source]#

Turn any iterable into a progress bar, with notebook option

Parameters
  • it (iterable) – Iterable to wrap with progress bar

  • nb (bool) – Whether to display the notebook progress bar

  • **kwargs (dict-like) – additional options to send to tqdm

xyzpy.save_df(df, name, engine='pickle', key='df', **kwargs)[source]#

Save a dataframe to disk.

xyzpy.save_ds(ds, file_name, engine='h5netcdf', **kwargs)[source]#

Saves a xarray dataset.

Parameters
  • ds (xarray.Dataset) – The dataset to save.

  • file_name (str) – Name of the file to save to.

  • engine ({'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional) – Engine used to save file with.

Return type

None

xyzpy.save_merge_ds(ds, fname, overwrite=None, **kwargs)[source]#

Save dataset ds, but check for an existing dataset with that name first, and if it exists, merge the two before saving.

Parameters
  • ds (xarray.Dataset) – The dataset to save.

  • fname (str) – The file name.

  • overwrite ({None, False, True}, optional) –

    How to merge the dataset with the existing dataset.

    • None: the datasets will be merged in there are no conflicts

    • False: data will be taken from old dataset if conflicting

    • True: data will be taken from new dataset if conflicting

xyzpy.scatter(ds, x, y, z=None, y_err=None, x_err=None, **plot_opts)[source]#

From ds plot a scatter of y against x, optionally for varying z.

Parameters
  • ds (xarray.Dataset) – Dataset to plot from.

  • x (str) – Quantity to plot along the x-axis.

  • y (str or tuple[str]) – Quantity(s) to plot along the y-axis. If tuple, plot each of the variables - instead of z.

  • z (str, optional) – Dimension to plot into the page.

  • y_err (str, optional) – Variable to plot as y-error.

  • x_err (str, optional) – Variable to plot as x-error.

  • row (str, optional) – Dimension to vary over as a function of rows.

  • col (str, optional) – Dimension to vary over as a function of columns.

  • plot_opts – See xyzpy.plot.core.PLOTTER_DEFAULTS.

xyzpy.sort_dims(ds)[source]#

Make sure all the dimensions on all the variables appear in the same order.

xyzpy.trimna(obj)[source]#

Drop values across all dimensions for which all values are NaN.

xyzpy.unzip(its, zip_level=1)[source]#

Split a nested iterable at a specified level, i.e. in numpy language transpose the specified ‘axis’ to be the first.

Parameters
  • its (iterable (of iterables (of iterables ...))) – ‘n-dimensional’ iterable to split

  • zip_level (int) – level at which to split the iterable, default of 1 replicates zip(*its) behaviour.

Example

>>> x = [[(1, True), (2, False), (3, True)],
         [(7, True), (8, False), (9, True)]]
>>> nums, bools = unzip(x, 2)
>>> nums
((1, 2, 3), (7, 8, 9))
>>> bools
((True, False, True), (True, False, True))
xyzpy.visualize_tensor(array, max_projections=None, angles=None, scales=None, projection_overlap_spacing=1.05, skew_factor=0.05, spacing_factor=1.0, magscale='linear', size_map=True, size_pow=0.5, size_scale=1.0, alpha_map=True, alpha_pow=0.5, alpha=0.8, marker='o', linewidths=0, show_lattice=True, lattice_opts=None, compass=False, compass_loc='auto', compass_size=0.1, compass_bounds=None, compass_labels=None, compass_opts=None, max_mag=None, legend=False, legend_loc='auto', legend_size=0.15, legend_bounds=None, legend_resolution=3, interleave_projections=False, reverse_projections=False, facecolor=None, rasterize=4096, rasterize_dpi=300, figsize=(5, 5), ax=None)[source]#

Visualize all entries of a tensor, with indices mapped into the plane and values mapped into a color wheel.

Parameters
  • array (ndarray) – The tensor to visualize.

  • skew_factor (float, optional) – When there are more than two dimensions, a factor to scale the rotations by to avoid overlapping data points.

  • size_map (bool, optional) – Whether to map the tensor value magnitudes to marker size.

  • size_scale (float, optional) – An overall factor to scale the marker size by.

  • alpha_map (bool, optional) – Whether to map the tensor value magnitudes to marker alpha.

  • alpha_pow (float, optional) – The power to raise the magnitude to when mapping to alpha.

  • alpha (float, optional) – The overall alpha to use for all markers if not alpha_map.

  • show_lattice (bool, optional) – Show a small grey dot for every ‘lattice’ point regardless of value.

  • lattice_opts (dict, optional) – Options to pass to maplotlib.Axis.scatter for the lattice points.

  • linewidths (float, optional) – The linewidth to use for the markers.

  • marker (str, optional) – The marker to use for the markers.

  • figsize (tuple, optional) – The size of the figure to create, if ax is not provided.

  • ax (matplotlib.Axis, optional) – The axis to draw to. If not provided, a new figure will be created.

Returns

  • fig (matplotlib.Figure) – The figure containing the plot, or None if ax was provided.

  • ax (matplotlib.Axis) – The axis containing the plot.