xyzpy ===== .. py:module:: xyzpy Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/xyzpy/_version/index /autoapi/xyzpy/gen/index /autoapi/xyzpy/manage/index /autoapi/xyzpy/plot/index /autoapi/xyzpy/utils/index Attributes ---------- .. autoapisummary:: xyzpy.case_runner_to_df xyzpy.combo_runner_to_df Classes ------- .. autoapisummary:: xyzpy.Crop xyzpy.Harvester xyzpy.Runner xyzpy.Sampler xyzpy.RayExecutor xyzpy.RayGPUExecutor xyzpy.AutoHeatMap xyzpy.AutoHistogram xyzpy.AutoLinePlot xyzpy.AutoScatter xyzpy.HeatMap xyzpy.Histogram xyzpy.LinePlot xyzpy.Scatter xyzpy.Benchmarker xyzpy.MemoryMonitor xyzpy.RunningCovariance xyzpy.RunningCovarianceMatrix xyzpy.RunningStatistics xyzpy.Timer Functions --------- .. autoapisummary:: xyzpy.case_runner xyzpy.case_runner_to_ds xyzpy.find_missing_cases xyzpy.is_case_missing xyzpy.parse_into_cases xyzpy.combo_runner xyzpy.combo_runner_to_ds xyzpy.clean_slurm_outputs xyzpy.grow xyzpy.load_crops xyzpy.manage_slurm_outputs xyzpy.cultivate xyzpy.label xyzpy.auto_xyz_ds xyzpy.cache_to_disk xyzpy.check_runs xyzpy.load_df xyzpy.load_ds xyzpy.merge_sync_conflict_datasets xyzpy.save_df xyzpy.save_ds xyzpy.save_merge_ds xyzpy.sort_dims xyzpy.trimna xyzpy.cimluv xyzpy.cimple xyzpy.cimple_bright xyzpy.cmoke xyzpy.convert_colors xyzpy.get_neutral_style xyzpy.infiniplot xyzpy.neutral_style xyzpy.auto_iheatmap xyzpy.auto_ilineplot xyzpy.auto_iscatter xyzpy.iheatmap xyzpy.ilineplot xyzpy.iscatter xyzpy.auto_heatmap xyzpy.auto_histogram xyzpy.auto_lineplot xyzpy.auto_scatter xyzpy.heatmap xyzpy.histogram xyzpy.lineplot xyzpy.scatter xyzpy.visualize_matrix xyzpy.visualize_tensor xyzpy.benchmark xyzpy.estimate_from_repeats xyzpy.format_number_with_error xyzpy.get_peak_memory_usage xyzpy.getsizeof xyzpy.progbar xyzpy.report_memory xyzpy.report_memory_gpu xyzpy.unzip Package Contents ---------------- .. py:function:: case_runner(fn, fn_args, cases, combos=None, constants=None, split=False, shuffle=False, parse=True, parallel=False, executor=None, num_workers=None, verbosity=1) Simple case runner that outputs the raw tuple of results. :param fn: Function with which to evalute cases with :type fn: callable :param fn_args: Names of case arguments that fn takes, can be ``None`` if each case is a ``dict``. :type fn_args: tuple :param cases: List of specific configurations that ``fn_args`` should take. If ``fn_args`` is ``None``, each case should be a ``dict``. :type cases: iterable[tuple] or iterable[dict] :param combos: Optional specification of sub-combinations. :type combos: dict_like[str, iterable], optional :param constants: Constant function arguments. :type constants: dict, optional :param split: See :func:`~xyzpy.combo_runner`. :type split: bool, optional :param shuffle: If given, compute the results in a random order (using ``random.seed`` and ``random.shuffle``), which can be helpful for distributing resources when not all cases are computationally equal. :type shuffle: bool or int, optional :param parallel: Process combos in parallel, default number of workers picked. :type parallel: bool, optional :param executor: Submit all combos to this pool executor. Must have ``submit`` or ``apply_async`` methods and API matching either ``concurrent.futures`` or an ``ipyparallel`` view. Pools from ``multiprocessing.pool`` are also supported. :type executor: executor-like pool, optional :param num_workers: Explicitly choose how many workers to use, None for automatic. :type num_workers: int, optional :param verbosity: How much information to display: - 0: nothing, - 1: just progress, - 2: all information. :type verbosity: {0, 1, 2}, optional :returns: **results** :rtype: list of fn output for each case .. py:data:: case_runner_to_df .. py:function:: case_runner_to_ds(fn, fn_args, cases, var_names, var_dims=None, var_coords=None, combos=None, constants=None, resources=None, attrs=None, shuffle=False, to_df=False, parse=True, parallel=False, num_workers=None, executor=None, verbosity=1) Takes a list of ``cases`` to run ``fn`` over, possibly in parallel, and outputs a :class:`xarray.Dataset`. :param fn: Function to evaluate. :type fn: callable :param fn_args: Names and order of arguments to ``fn``, can be ``None`` if ``cases`` are supplied as dicts. :type fn_args: str or iterable[str] :param cases: List of configurations used to generate results. :type cases: iterable[tuple] or iterable[dict] :param var_names: Variable name(s) of the output(s) of ``fn``. :type var_names: str or iterable of str :param var_dims: 'Internal' names of dimensions for each variable, the values for each dimension should be contained as a mapping in either `var_coords` (not needed by `fn`) or `constants` (needed by `fn`). :type var_dims: sequence of either strings or string sequences, optional :param var_coords: Mapping of extra coords the output variables may depend on. :type var_coords: mapping, optional :param combos: If specified, run all combinations of some arguments in these mappings. :type combos: dict_like[str, iterable], optional :param constants: Arguments to `fn` which are not iterated over, these will be recorded either as attributes or coordinates if they are named in `var_dims`. :type constants: mapping, optional :param resources: Like `constants` but they will not be recorded. :type resources: mapping, optional :param attrs: Any extra attributes to store. :type attrs: mapping, optional :param shuffle: If given, compute the results in a random order (using ``random.seed`` and ``random.shuffle``), which can be helpful for distributing resources when not all cases are computationally equal. :type shuffle: bool or int, optional :param parse: Whether to perform parsing of the inputs arguments. :type parse: bool, optional :param parallel: Process combos in parallel, default number of workers picked. :type parallel: bool, optional :param executor: Submit all combos to this pool executor. Must have ``submit`` or ``apply_async`` methods and API matching either ``concurrent.futures`` or an ``ipyparallel`` view. Pools from ``multiprocessing.pool`` are also supported. :type executor: executor-like pool, optional :param num_workers: Explicitly choose how many workers to use, None for automatic. :type num_workers: int, optional :param verbosity: How much information to display: - 0: nothing, - 1: just progress, - 2: all information. :type verbosity: {0, 1, 2}, optional :returns: **ds** -- Dataset with minimal covering coordinates and all cases evaluated. :rtype: xarray.Dataset .. py:function:: find_missing_cases(ds, ignore_dims=None, method='isnull') Find all cases in a dataset or DataArray with missing data. :param ds: Dataset or DataArray in which to find missing data :type ds: xarray.Dataset or xarray.DataArray :param ignore_dims: Internal variable dimensions (i.e. to ignore). By default (None) this is set to any dimensions that don't appear on all variables. :type ignore_dims: set, optional :returns: **cases_missing** -- List of cases with missing data, where each case is a dict mapping from dimension name to coordinate value. :rtype: iterable[dict] .. py:function:: is_case_missing(ds, setting, method='isnull') Does the dataset or dataarray ``ds`` not contain any non-null data for single location ``setting``? Note that this only returns true if *all* data across *all* variables is completely missing at the location. :param ds: Dataset or dataarray to look in. :type ds: xarray.Dataset or xarray.DataArray :param setting: Coordinate location to check. :type setting: dict[str, hashable] :returns: **missing** :rtype: bool .. py:function:: parse_into_cases(combos=None, cases=None, ds=None, method='isnull') Convert maybe ``combos`` and maybe ``cases`` to a single list of ``cases`` only, also optionally filtering based on whether any data at each location is already present in Dataset or DataArray ``ds``. Note that this only checks whether *all* data across *all* variables is completely missing at the location. To check against a single variable only simply supply a DataArray instead of a Dataset, e.g. ``ds=ds["var_name"]``. :param combos: Parameter combinations. :type combos: dict_like[str, iterable], optional :param cases: Parameter configurations. :type cases: iterable[dict], optional :param ds: Dataset or DataArray in which to check for existing data. :type ds: xarray.Dataset or xarray.DataArray, optional :param method: How to determine whether data is missing when ``ds`` is supplied. "isnull" checks for null/nan values, while "isfinite" checks for all non-finite values (i.e. inf or nan). :type method: {"isnull", "isfinite"}, optional :returns: **new_cases** -- The combined and possibly filtered list of cases. :rtype: iterable[dict] .. py:function:: combo_runner(fn, combos=None, *, cases=None, constants=None, split=False, flat=False, shuffle=False, parallel=False, executor=None, num_workers=None, verbosity=1, desc=None) Take a function ``fn`` and compute it over all combinations of named variables values, optionally showing progress and in parallel. :param fn: Function to analyse. :type fn: callable :param combos: All combinations of each argument to values mapping will be computed. Each argument range thus gets a dimension in the output array(s). :type combos: dict_like[str, iterable] :param cases: Optional list of specific configurations. If both ``combos`` and ``cases`` are given, then the function is computed for all sub-combinations in ``combos`` for each case in ``cases``, arguments can thus only appear in one or the other. Note that missing combinations of arguments will be represented by ``nan`` if creating a nested array. :type cases: sequence of mappings, optional :param constants: Constant function arguments. Unlike ``combos`` and ``cases``, these won't produce dimensions in the output result when ``flat=False``. :type constants: dict, optional :param split: Whether to split (unzip) the outputs of ``fn`` into multiple output arrays or not. :type split: bool, optional :param flat: Whether to return a flat list of results or to return a nested tuple suitable to be supplied to ``numpy.array``. :type flat: bool, optional :param shuffle: If given, compute the results in a random order (using ``random.seed`` and ``random.shuffle``), which can be helpful for distributing resources when not all cases are computationally equal. :type shuffle: bool or int, optional :param parallel: Process combos in parallel, default number of workers picked. :type parallel: bool, optional :param executor: Submit all combos to this pool executor. Must have ``submit`` or ``apply_async`` methods and API matching either ``concurrent.futures`` or an ``ipyparallel`` view. Pools from ``multiprocessing.pool`` are also supported. :type executor: executor-like pool, optional :param num_workers: Explicitly choose how many workers to use, None for automatic. :type num_workers: int, optional :param verbosity: How much information to display: - 0: nothing, - 1: just progress, - 2: postfix the current settings to the progress bar. :type verbosity: {0, 1, 2}, optional :param desc: Description to show in the progress bar, if ``verbosity > 0``. :type desc: str, optional :returns: **data** -- Nested tuple containing all combinations of running ``fn`` if ``flat == False`` else a flat list of results. :rtype: nested tuple .. rubric:: Examples >>> def fn(a, b, c, d): ... return str(a) + str(b) + str(c) + str(d) Run all possible combos:: >>> xyz.combo_runner( ... fn, ... combos={ ... 'a': [1, 2], ... 'b': [3, 4], ... 'c': [5, 6], ... 'd': [7, 8], ... }, ... ) 100%|##########| 16/16 [00:00<00:00, 84733.41it/s] (((('1357', '1358'), ('1367', '1368')), (('1457', '1458'), ('1467', '1468'))), ((('2357', '2358'), ('2367', '2368')), (('2457', '2458'), ('2467', '2468')))) Run only a selection of cases:: >>> xyz.combo_runner( ... fn, ... cases=[ ... {'a': 1, 'b': 3, 'c': 5, 'd': 7}, ... {'a': 2, 'b': 4, 'c': 6, 'd': 8}, ... ], ... ) 100%|##########| 2/2 [00:00<00:00, 31418.01it/s] (((('1357', nan), (nan, nan)), ((nan, nan), (nan, nan))), (((nan, nan), (nan, nan)), ((nan, nan), (nan, '2468')))) Run only certain cases of some args, but all combinations of others:: >>> xyz.combo_runner( ... fn, ... cases=[ ... {'a': 1, 'b': 3}, ... {'a': 2, 'b': 4}, ... ], ... combos={ ... 'c': [3, 4], ... 'd': [4, 5], ... }, ... ) 100%|##########| 8/8 [00:00<00:00, 92691.80it/s] (((('1334', '1335'), ('1344', '1345')), ((nan, nan), (nan, nan))), (((nan, nan), (nan, nan)), (('2434', '2435'), ('2444', '2445')))) .. py:data:: combo_runner_to_df .. py:function:: combo_runner_to_ds(fn, combos, var_names, *, var_dims=None, var_coords=None, cases=None, constants=None, resources=None, attrs=None, shuffle=False, parse=True, to_df=False, parallel=False, num_workers=None, executor=None, verbosity=1, desc=None) Evaluate a function over all cases and combinations and output to a :class:`xarray.Dataset`. :param fn: Function to evaluate. :type fn: callable :param combos: Mapping of each individual function argument to sequence of values. :type combos: dict_like[str, iterable] :param var_names: Variable name(s) of the output(s) of `fn`, set to None if fn outputs data already labelled in a Dataset or DataArray. :type var_names: str, sequence of strings, or None :param var_dims: 'Internal' names of dimensions for each variable, the values for each dimension should be contained as a mapping in either `var_coords` (not needed by `fn`) or `constants` (needed by `fn`). :type var_dims: sequence of either strings or string sequences, optional :param var_coords: Mapping of extra coords the output variables may depend on. :type var_coords: mapping, optional :param cases: Individual cases to run for some or all function arguments. :type cases: sequence of dicts, optional :param constants: Arguments to `fn` which are not iterated over, these will be recorded either as attributes or coordinates if they are named in `var_dims`. :type constants: mapping, optional :param resources: Like `constants` but they will not be recorded. :type resources: mapping, optional :param attrs: Any extra attributes to store. :type attrs: mapping, optional :param parallel: Process combos in parallel, default number of workers picked. :type parallel: bool, optional :param executor: Submit all combos to this pool executor. Must have ``submit`` or ``apply_async`` methods and API matching either ``concurrent.futures`` or an ``ipyparallel`` view. Pools from ``multiprocessing.pool`` are also supported. :type executor: executor-like pool, optional :param num_workers: Explicitly choose how many workers to use, None for automatic. :type num_workers: int, optional :param verbosity: How much information to display: - 0: nothing, - 1: just progress, - 2: postfix the current settings to the progress bar. :type verbosity: {0, 1, 2}, optional :param desc: Description to show in the progress bar, if ``verbosity > 0``. :type desc: str, optional :returns: **ds** -- Multidimensional labelled dataset contatining all the results if ``to_df=False`` (the default), else a pandas dataframe with results as labelled rows. :rtype: xarray.Dataset or pandas.DataFrame .. py:class:: Crop(*, fn=None, name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None, shuffle=False, farmer=None, autoload=True) Bases: :py:obj:`object` Encapsulates all the details describing a single 'crop', that is, its location, name, and batch size/number. Also allows tracking of crop's progress, and experimentally, automatic submission of workers to grid engine to complete un-grown cases. Can also be instantiated directly from a :class:`~xyzpy.Runner` or :class:`~xyzpy.Harvester` or :class:`~Sampler.Crop` instance. :param fn: Target function - Crop `name` will be inferred from this if not given explicitly. If given, `Sower` will also default to saving a version of `fn` to disk for `cropping.grow` to use. :type fn: callable, optional :param name: Custom name for this set of runs - must be given if `fn` is not. :type name: str, optional :param parent_dir: If given, alternative directory to put the ".xyz-{name}/" folder in with all the cases and results. :type parent_dir: str, optional :param save_fn: Whether to save the function to disk for `cropping.grow` to use. Will default to True if `fn` is given. :type save_fn: bool, optional :param batchsize: How many cases to group into a single batch per worker. By default, batchsize=1. Cannot be specified if `num_batches` is. :type batchsize: int, optional :param num_batches: How many total batches to aim for, cannot be specified if `batchsize` is. :type num_batches: int, optional :param farmer: A Runner, Harvester or Sampler, instance, from which the `fn` can be inferred and which can also allow the Crop to reap itself straight to a dataset or dataframe. :type farmer: {xyzpy.Runner, xyzpy.Harvester, xyzpy.Sampler}, optional :param autoload: If True, check for the existence of a Crop written to disk with the same location, and if found, load it. :type autoload: bool, optional .. seealso:: :py:obj:`Runner.Crop`, :py:obj:`Harvester.Crop`, :py:obj:`Sampler.Crop` .. py:attribute:: name :value: None .. py:attribute:: parent_dir :value: None .. py:attribute:: save_fn :value: None .. py:attribute:: batchsize :value: None .. py:attribute:: num_batches :value: None .. py:attribute:: shuffle :value: False .. py:attribute:: _batch_remainder :value: None .. py:attribute:: _all_nan_result :value: None .. py:attribute:: _num_sown_batches :value: -1 .. py:attribute:: _num_results :value: -1 .. py:property:: runner .. py:method:: choose_batch_settings(*, combos=None, cases=None) Work out how to divide all cases into batches, i.e. ensure that ``batchsize * num_batches >= num_cases``. .. py:method:: ensure_dirs_exists() Make sure the directory structure for this crop exists. .. py:method:: save_info(combos=None, cases=None, fn_args=None) Save information about the sowed cases. .. py:method:: load_info() Load the full settings from disk. .. py:method:: load_batch(batch_number) Load a specific batch from disk. .. py:method:: load_result(batch_number) Load a specific result from disk. .. py:method:: save_result(batch_number, result) Save a specific result to disk. .. py:method:: _sync_info_from_disk(only_missing=True) Load information about the saved cases. .. py:method:: save_function_to_disk() Save the base function to disk using cloudpickle .. py:method:: load_function() Load the saved function from disk, and try to re-insert it back into Harvester or Runner if present. .. py:method:: prepare(combos=None, cases=None, fn_args=None) Write information about this crop and the supplied combos to disk. Typically done at start of sow, not when Crop instantiated. .. py:method:: is_prepared() Check whether this crop has been written to disk. .. py:method:: calc_progress() Calculate how much progressed has been made in growing the batches. .. py:method:: is_ready_to_reap() Have all batches been grown? .. py:method:: completed_results() -> tuple[int, Ellipsis] Return tuple of batches which have been grown already. .. py:method:: missing_results() -> tuple[int, Ellipsis] Return tuple of batches which haven't been grown yet. .. py:method:: delete_all() Delete the crop directory and all its contents, and reset any loaded information on this Crop object. .. py:method:: handle_existing(action='ask', msg=None, e=None, overwrite=False) Handle an already prepared crop. :param action: What to do with the existing crop. If ``'ask'`` (default), interactively prompt the user. Otherwise, execute the specified action directly. :type action: {'ask', 'reap', 'delete', 'skip', 'raise'} :param msg: Message to display when prompting. :type msg: str, optional :param e: Exception to re-raise if action is ``'raise'``. :type e: Exception, optional :param overwrite: Whether to overwrite existing data when reaping. :type overwrite: bool, optional .. py:property:: all_nan_result Get a stand-in result for cases which are missing still. .. py:method:: __str__() .. py:method:: __repr__() .. py:method:: parse_constants(constants=None) .. py:method:: sow_combos(combos, cases=None, constants=None, shuffle=False, verbosity=1, desc='Sow', batchsize=None, num_batches=None) Sow combos to disk to be later grown, potentially in batches. Note if you have already sown this `Crop`, as long as the number of batches hasn't changed (e.g. you have just tweaked the function or a constant argument), you can safely resow and only the batches will be overwritten, i.e. the results will remain. :param combos: The combinations to sow for all or some function arguments. :type combos: dict_like[str, iterable] :param cases: Optionally provide a sequence of individual cases to sow for some or all function arguments. :type cases: iterable or mappings, optional :param constants: Provide additional constant function values to use when sowing. :type constants: mapping, optional :param shuffle: If given, sow the combos in a random order (using ``random.seed`` and ``random.shuffle``), which can be helpful for distributing resources when not all cases are computationally equal. :type shuffle: bool or int, optional :param verbosity: How much information to show when sowing. 0: no output, 1: progress bar, 2: progress bar with each setting being sown. :type verbosity: int, optional :param desc: Description to show in the progress bar when sowing. :type desc: str, optional :param batchsize: If specified, set a new batchsize for the crop. :type batchsize: int, optional :param num_batches: If specified, set a new num_batches for the crop. :type num_batches: int, optional .. py:method:: sow_cases(fn_args, cases, combos=None, constants=None, verbosity=1, batchsize=None, num_batches=None) Sow cases to disk to be later grown, potentially in batches. :param fn_args: The names and order of the function arguments, can be ``None`` if each case is supplied as a ``dict``. :type fn_args: iterable[str] or str :param cases: Sequence of individual cases to sow for all or some function arguments. :type cases: iterable or mappings, optional :param combos: Combinations to sow for some or all function arguments. :type combos: dict_like[str, iterable] :param constants: Provide additional constant function values to use when sowing. :type constants: mapping, optional :param verbosity: How much information to show when sowing. 0: no output, 1: progress bar, 2: progress bar with each setting being sown. :type verbosity: int, optional :param batchsize: If specified, set a new batchsize for the crop. :type batchsize: int, optional :param num_batches: If specified, set a new num_batches for the crop. :type num_batches: int, optional .. py:method:: sow_samples(n, combos=None, constants=None, verbosity=1) Sow ``n`` samples to disk. .. py:method:: grow_subprocess(batch_ids=None, num_workers=None, num_threads=None, gpus=None, affinities=None, raise_errors=False, log=False, min_wait=1e-06, max_wait=0.1, verbosity=1, verbosity_grow=0, desc='Grow') Grow particular or missing batches using a single fresh subprocess per batch. This has a higher overhead for starting each process, but is more robust memory wise, and allows controlling the number of threads used, CPU affinity and GPU assignment. :param batch_ids: Which batch numbers to grow, defaults to all missing. :type batch_ids: int or sequence of int, optional :param num_workers: The maximum number of concurrent subprocesses (default 1). :type num_workers: int, optional :param num_threads: The number of threads per subprocess (default 1). :type num_threads: int, optional :param gpus: GPU device IDs to assign to subprocesses via ``CUDA_VISIBLE_DEVICES``. Each subprocess gets a single GPU from this pool; the pool also limits concurrency to the number of GPUs provided. You can oversubscribe GPUs by repeating device IDs, e.g. ``0,0,1,1`` to allow 2 subprocesses to share each GPU. :type gpus: int, str, or sequence of int, optional :param affinities: CPU core IDs to pin subprocesses to via ``taskset``. Also limits concurrency to the number of affinities. :type affinities: int, str, or sequence of int, optional :param raise_errors: Whether to raise errors encountered during growing. :type raise_errors: bool, optional :param log: Whether to save subprocess stdout and stderr to log files in the crop directory under ``logs/batch-{batch_id}.log``. Default is False, which discards stdout and only prints stderr on error. :type log: bool, optional :param min_wait: Minimum polling interval in seconds. :type min_wait: float, optional :param max_wait: Maximum polling interval in seconds. :type max_wait: float, optional :param verbosity: How much information to show when growing. 0: no output, 1: progress bar, 2: progress bar with each setting being grown. :type verbosity: int, optional :param verbosity_grow: Verbosity within each batch grow. :type verbosity_grow: int, optional :param desc: Description to show in the progress bar when sowing. :type desc: str, optional .. py:method:: grow(batch_ids=None, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, raise_errors=False, debugging=False, verbosity=1, verbosity_grow=0, log=False, desc='Grow', **combo_runner_opts) Grow specific batch numbers using this process. :param batch_ids: Which batch numbers to grow, by default all missing results. :type batch_ids: int or sequence of ints, optional :param subprocess: Whether to grow each batch in a fresh subprocess. This adds about 1 second of overhead per batch, but allows the number of threads, cpu affinity and gpu assignment to be controlled. If "auto" (default) then subprocesses will be used if ``num_threads``, ``gpus`` or ``affinities`` are specified. See :meth:`Crop.grow_subprocess` for details. :type subprocess: "auto" or bool, optional :param num_workers: Maximum number of batches to run concurrently. In subprocess mode this is the cap on simultaneous subprocesses (defaults to 1 if not given). In in-process mode this is the size of the joblib loky process pool used by ``combo_runner_core`` (``None`` = serial). :type num_workers: int, optional :param num_threads: Number of threads each worker is allowed to use, applied via the standard env vars (``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS``, ...). Only meaningful in subprocess mode (the env vars must be set before numerical libraries are imported); setting it implies ``subprocess=True`` when ``subprocess="auto"``. Passing this with ``subprocess=False`` raises ``ValueError``. :type num_threads: int, optional :param gpus: GPU device IDs to assign to subprocesses via ``CUDA_VISIBLE_DEVICES``. Each subprocess gets a single GPU from this pool; the pool also caps concurrency to its size. Repeat IDs to oversubscribe (e.g. ``"0,0,1,1"`` shares each GPU between two workers). Subprocess-mode only — implies ``subprocess=True`` when ``subprocess="auto"``; raises ``ValueError`` with ``subprocess=False``. :type gpus: int, str, or sequence of int, optional :param affinities: CPU core IDs to pin subprocesses to via ``taskset``. Each subprocess gets one affinity from the pool, which also caps concurrency. Subprocess-mode only — implies ``subprocess=True`` when ``subprocess="auto"``; raises ``ValueError`` with ``subprocess=False``. :type affinities: int, str, or sequence of int, optional :param raise_errors: Whether to raise errors if they occur during growing. :type raise_errors: bool, optional :param debugging: Whether to set the logging level to debug. :type debugging: bool, optional :param verbosity: How much information to show when growing. 0: no output, 1: progress bar, 2: progress bar with each setting being grown. :type verbosity: int, optional :param verbosity_grow: How much information to show when growing each batch. :type verbosity_grow: int, optional :param log: Whether to save subprocess output to log files. Only used when ``subprocess=True``. :type log: bool, optional :param desc: Description to show in the progress bar when growing. :type desc: str, optional :param \*\*combo_runner_opts: Additional options forwarded to either :meth:`Crop.grow_subprocess` (``min_wait``, ``max_wait``, ...) when ``subprocess`` is True, or to ``combo_runner_core`` (``executor``, ``parallel``, ...) when ``subprocess`` is False. .. py:method:: grow_missing(**combo_runner_opts) Grow any missing results using this process. .. py:method:: reap_combos(wait=False, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap') Reap already sown and grown results from this crop. :param wait: Whether to wait for results to appear. If false (default) all results need to be in place before the reap. :type wait: bool, optional :param clean_up: Whether to delete all the batch files once the results have been gathered. If left as ``None`` this will be automatically set to ``not allow_incomplete``. :type clean_up: bool, optional :param allow_incomplete: Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan. :type allow_incomplete: bool, optional :param verbosity: How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped. :type verbosity: int, optional :param desc: Description to show in the progress bar when reaping. :type desc: str, optional :returns: **results** -- 'N-dimensional' tuple containing the results. :rtype: nested tuple .. py:method:: reap_combos_to_ds(var_names=None, var_dims=None, var_coords=None, constants=None, attrs=None, parse=True, wait=False, clean_up=None, allow_incomplete=False, to_df=False, verbosity=1, desc='Reap') Reap a function over sowed combinations and output to a Dataset. :param var_names: Variable name(s) of the output(s) of `fn`, set to None if fn outputs data already labeled in a Dataset or DataArray. :type var_names: str, sequence of strings, or None :param var_dims: 'Internal' names of dimensions for each variable, the values for each dimension should be contained as a mapping in either `var_coords` (not needed by `fn`) or `constants` (needed by `fn`). :type var_dims: sequence of either strings or string sequences, optional :param var_coords: Mapping of extra coords the output variables may depend on. :type var_coords: mapping, optional :param constants: Arguments to `fn` which are not iterated over, these will be recorded either as attributes or coordinates if they are named in `var_dims`. :type constants: mapping, optional :param resources: Like `constants` but they will not be recorded. :type resources: mapping, optional :param attrs: Any extra attributes to store. :type attrs: mapping, optional :param wait: Whether to wait for results to appear. If false (default) all results need to be in place before the reap. :type wait: bool, optional :param clean_up: Whether to delete all the batch files once the results have been gathered. If left as ``None`` this will be automatically set to ``not allow_incomplete``. :type clean_up: bool, optional :param allow_incomplete: Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan. :type allow_incomplete: bool, optional :param to_df: Whether to reap to a ``xarray.Dataset`` or a ``pandas.DataFrame``. :type to_df: bool, optional :param verbosity: How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped. :type verbosity: int, optional :param desc: Description to show in the progress bar when reaping. :type desc: str, optional :returns: Multidimensional labeled dataset containing all the results. :rtype: xarray.Dataset or pandas.Dataframe .. py:method:: reap_runner(runner, wait=False, clean_up=None, allow_incomplete=False, to_df=False, verbosity=1, desc='Reap', **kwargs) Reap a Crop over sowed combos and save to a dataset defined by a :class:`~xyzpy.Runner`. .. py:method:: reap_harvest(harvester, wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap', **kwargs) Reap a Crop over sowed combos and merge with the dataset defined by a :class:`~xyzpy.Harvester`. .. py:method:: reap_samples(sampler, wait=False, sync=True, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap', **kwargs) Reap a Crop over sowed combos and merge with the dataframe defined by a :class:`~xyzpy.Sampler`. .. py:method:: reap(wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap') Reap sown and grown combos from disk. Return a dataset if a runner or harvester is set, otherwise, the raw nested tuple. :param wait: Whether to wait for results to appear. If false (default) all results need to be in place before the reap. :type wait: bool, optional :param sync: Immediately sync the new dataset with the on-disk full dataset or dataframe if a harvester or sampler is used. :type sync: bool, optional :param overwrite: How to compare data when syncing to on-disk dataset. If ``None``, (default) merge as long as no conflicts. ``True``: overwrite with the new data. ``False``, discard any new conflicting data. :type overwrite: bool, optional :param clean_up: Whether to delete all the batch files once the results have been gathered. If left as ``None`` this will be automatically set to ``not allow_incomplete``. :type clean_up: bool, optional :param allow_incomplete: Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan. :type allow_incomplete: bool, optional :param verbosity: How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped. :type verbosity: int, optional :param desc: Description to show in the progress bar when reaping. :type desc: str, optional :rtype: nested tuple or xarray.Dataset .. py:method:: check_bad(delete_bad=True) Check that the result dumps are not bad -> sometimes length does not match the batch. Optionally delete these so that they can be re-grown. :param delete_bad: Delete bad results as they are come across. :type delete_bad: bool :returns: **bad_ids** -- The bad batch numbers. :rtype: tuple .. py:method:: _get_fn() .. py:method:: _set_fn(fn) .. py:method:: _del_fn() .. py:attribute:: fn .. py:property:: num_sown_batches Total number of batches to be run/grown. .. py:property:: num_results .. py:function:: clean_slurm_outputs(job, directory='.', cancel_if_finished=True) .. py:function:: grow(batch_number, crop=None, fn=None, num_workers=None, check_mpi=True, verbosity=2, debugging=False, raise_errors=True) Automatically process a batch of cases into results. Should be run in an ".xyz-{fn_name}" folder, or `crop` should be specified. :param batch_number: Which batch to 'grow' into a set of results. :type batch_number: int :param crop: Description of where and how to store the cases and results. :type crop: xyzpy.Crop :param fn: If specified, the function used to generate the results, otherwise the function will be loaded from disk. :type fn: callable, optional :param num_workers: If specified, grow using a pool of this many workers. This uses ``joblib.externals.loky`` to spawn processes. :type num_workers: int, optional :param check_mpi: Whether to check if the process is rank 0 and only save results if so - allows mpi functions to be simply used. Defaults to true, this should only be turned off if e.g. a pool of workers is being used to run different ``grow`` instances. :type check_mpi: bool, optional :param verbosity: How much information to show. :type verbosity: {0, 1, 2}, optional :param debugging: Set logging level to DEBUG. :type debugging: bool, optional :param raise_errors: Whether to raise errors that occur during the computation. If growing many batches in parallel, it can be useful to set this to False so a single error doesn't crash the whole process. :type raise_errors: bool, optional .. py:function:: load_crops(directory='.') Automatically load all the crops found in the current directory. :param directory: Which directory to load the crops from, defaults to '.' - the current. :type directory: str, optional :returns: Mapping of the crop name to the Crop. :rtype: dict[str, Crop] .. py:function:: manage_slurm_outputs(crop, job, wait_time=60) .. py:class:: Harvester(runner: Runner, data_name=None, chunks=None, engine='h5netcdf', full_ds=None) Bases: :py:obj:`object` Container class for collecting and aggregating data to disk. :param runner: Performs the runs and describes the results. :type runner: Runner :param data_name: Base file path to save data to. :type data_name: str, optional :param chunks: If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays. :type chunks: int or dict, optional :param engine: Engine to use to save and load datasets. :type engine: str, optional :param full_ds: Initialize the Harvester with this dataset as the intitial full dataset. :type full_ds: xarray.Dataset, optional :param Members: :param -------: :param full_ds: Dataset containing all data harvested so far, by default synced to disk. :type full_ds: xarray.Dataset :param last_ds: Dataset containing just the data from the last harvesting run. :type last_ds: xarray.Dataset .. py:attribute:: runner .. py:attribute:: data_name :value: None .. py:attribute:: engine :value: 'h5netcdf' .. py:attribute:: chunks :value: None .. py:attribute:: _full_ds :value: None .. py:property:: fn .. py:method:: __call__(*args, **kwargs) .. py:property:: last_ds Dataset containing the last runs' data. .. py:method:: load_full_ds(chunks=None, engine=None) Load the disk dataset into ``full_ds``. :param chunks: If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays. :type chunks: int or dict, optional :param engine: Engine to use to save and load datasets. :type engine: str, optional .. py:property:: full_ds Dataset containing all saved runs. .. py:method:: save_full_ds(new_full_ds=None, engine=None) Save `full_ds` onto disk. The old file is moved and kept as a backup in case of errors when writing the new dataset to disk. :param new_full_ds: Save this dataset as the new full dataset, else use the current full datset. :type new_full_ds: xarray.Dataset, optional :param engine: Engine to use to save and load datasets. :type engine: str, optional .. py:method:: delete_ds(backup=False) Delete the on-disk dataset, optionally backing it up first. .. py:method:: add_ds(new_ds, sync=True, overwrite=None, chunks=None, engine=None) Merge a new dataset into the in-memory full dataset. :param new_ds: Data to be merged into the full dataset. :type new_ds: xr.Dataset or xr.DataArray :param sync: If True (default), load and save the disk dataset before and after merging in the new data. :type sync: bool, optional :param overwrite: How to combine data from the new run into the current full_ds: - ``None`` (default): attempt the merge and only raise if data conflicts. - ``True``: overwrite conflicting current data with that from the new dataset. - ``False``: drop any conflicting data from the new dataset. :type overwrite: {None, False, True}, optional :param chunks: If not None, passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays. :type chunks: int or dict, optional :param engine: Engine to use to save and load datasets. :type engine: str, optional .. py:method:: expand_dims(name, value, engine=None) Add a new coordinate dimension with ``name`` and ``value``. The change is immediately synced with the on-disk dataset. Useful if you want to expand the parameter space along a previously constant argument. .. py:method:: drop_sel(labels=None, *, errors='raise', engine=None, **labels_kwargs) Drop specific values of coordinates from this harvester and its dataset. See http://xarray.pydata.org/en/latest/generated/xarray.Dataset.drop_sel.html. The change is immediately synced with the on-disk dataset. Useful for tidying uneeded data points. .. py:method:: _maybe_expand_combos(combos) Expand combos with ellipses into full coordinate values from the current full dataset. .. py:method:: harvest_combos(combos, *, cases=None, missing_only=False, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings) Run combos, automatically merging into an on-disk dataset. :param combos: The combos to run. The only difference here is that you can supply an ellipse ``...``, meaning the all values for that coordinate will be loaded from the current full dataset. :type combos: dict_like[str, iterable] :param missing_only: If True, only run combos that are not already present in the on-disk dataset. :type missing_only: bool, optional :param sync: If True (default), load and save the disk dataset before and after merging in the new data. :type sync: bool, optional :param overwrite: - ``None`` (default): attempt the merge and only raise if data conflicts. - ``True``: overwrite any conflicting current data with that from the new dataset. - ``False``: drop any conflicting data from the new dataset. :type overwrite: {None, False, True}, optional :param chunks: If not None, passed passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays. :type chunks: bool, optional :param engine: Engine to use to save and load datasets. :type engine: str, optional :param runner_settings: Supplied to :func:`~xyzpy.combo_runner`. .. py:method:: harvest_cases(cases, *, sync=True, overwrite=None, chunks=None, engine=None, **runner_settings) Run cases, automatically merging into an on-disk dataset. :param cases: The cases to run. :type cases: list of dict or tuple :param sync: If True (default), load and save the disk dataset before and after merging in the new data. :type sync: bool, optional :param overwrite: What to do regarding clashes with old data: - ``None`` (default): attempt the merge and only raise if data conflicts. - ``True``: overwrite conflicting current data with that from the new dataset. - ``False``: drop any conflicting data from the new dataset. :type overwrite: {None, False, True}, optional :param chunks: If not None, passed passed to xarray so that the full dataset is loaded and merged into with on-disk dask arrays. :type chunks: bool, optional :param engine: Engine to use to save and load datasets. :type engine: str, optional :param runner_settings: Supplied to :func:`~xyzpy.case_runner`. .. py:method:: Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None) Return a Crop instance with this Harvester, from which `fn` will be set, and then combos can be sown, grown, and reaped into the ``Harvester.full_ds``. See :class:`~xyzpy.Crop`. :rtype: Crop .. py:method:: __repr__() .. py:method:: cultivate(combos=None, cases=None, constants=None, name=None, parent_dir=None, batchsize=None, num_batches=None, missing_only=True, shuffle=True, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, log=False, raise_errors=True, verbosity=1, on_existing='ask', on_error='ask', clean_up=None, **grow_kwargs) Convenience method to run a full cycle of parsing combos into missing cases only, then persistently growing those cases, and finally merging the results into the full dataset. :param combos: The combos to run. The only difference here is that you can supply an ellipse ``...``, meaning the all values for that coordinate will be loaded from the current full dataset. :type combos: dict_like[str, iterable] :param cases: A sequence of (partial) individual settings to run. For each case, all settings given by combos will be generated. :type cases: sequence of mappings or tuples, optional :param constants: Extra constant arguments for this run. :type constants: dict, optional :param name: Name for the crop to be used for on-disk storage of batches, results and logs. You can use different names to grow results for the same dataset concurrently. :type name: str, optional :param parent_dir: Parent directory in which to create the crop folder (``.xyz-{name}/``). Defaults to the current working directory. :type parent_dir: str, optional :param batchsize: If given, the target number of cases to sow in each batch. This is computed from ``num_batches`` if not given and 1 if neither given. :type batchsize: int, optional :param num_batches: If given, the target number of batches to sow. This is computed from ``batchsize`` if not given and 1 if neither given. :type num_batches: int, optional :param missing_only: If True (default), only run cases that are not already present in the on-disk dataset. If `False`, the new results will overwrite any existing results. :type missing_only: bool, optional :param shuffle: If True (default), shuffle the order of cases before sowing and growing. This can be a useful basic form of load balancing. :type shuffle: bool, optional :param subprocess: Whether to grow each batch in a fresh subprocess. This adds about 1 second of overhead per batch, but allows the number of threads, cpu affinity and gpu assignment to be controlled. If "auto" (default) subprocesses are used when ``num_threads``, ``gpus`` or ``affinities`` are specified. See :meth:`xyzpy.Crop.grow` for details. :type subprocess: "auto" or bool, optional :param num_workers: Maximum number of batches to run concurrently. In subprocess mode this caps simultaneous subprocesses (defaults to 1 if not given). In in-process mode this is the joblib loky pool size (``None`` = serial). Forwarded to :meth:`xyzpy.Crop.grow`. :type num_workers: int, optional :param num_threads: Number of threads each worker is allowed to use, applied via the standard env vars (``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, etc.) in each subprocess. Implies ``subprocess=True`` when ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`. :type num_threads: int, optional :param gpus: GPU device IDs to assign to subprocesses via ``CUDA_VISIBLE_DEVICES``; the pool also caps concurrency. Implies ``subprocess=True`` when ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`. :type gpus: int, str, or sequence of int, optional :param affinities: CPU core IDs to pin subprocesses to via ``taskset``; the pool also caps concurrency. Implies ``subprocess=True`` when ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`. :type affinities: int, str, or sequence of int, optional :param log: Whether to save subprocess stdout and stderr to files in the crop directory under ``logs/batch-{batch_id}.log``. Subprocess-mode only. Forwarded to :meth:`xyzpy.Crop.grow`. :type log: bool, optional :param raise_errors: If True (default), raise any errors that occur during growing, otherwise just log them and continue with the next batch. :type raise_errors: bool, optional :param verbosity: The level of logging to print during the sow/grow/reap process. 0: no output, 1: progress bars, 2: progress bars with current setting postfixed. :type verbosity: int, optional :param on_existing: What to do if a crop with the same name already exists on disk. Default is ``'ask'`` (interactive prompt). :type on_existing: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional :param on_error: What to do if an error occurs during grow/reap. Default is ``'ask'`` (interactive prompt). :type on_error: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional :param clean_up: Whether to delete the on-disk batch, result and log files after successfully reaping. :type clean_up: bool or None, optional :param grow_kwargs: Further keyword arguments forwarded to :meth:`xyzpy.Crop.grow` (e.g. ``executor``, ``min_wait``, ...). .. seealso:: :py:obj:`Harvester.harvest_combos`, :py:obj:`xyzpy.Crop.grow` .. py:class:: Runner(fn, var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, **default_runner_settings) Bases: :py:obj:`object` Container class with all the information needed to systematically run a function over many parameters and capture the output in a dataset. :param fn: Function that produces a single instance of a result. :type fn: callable :param var_names: The ordered name(s) of the ouput variable(s) of `fn`. Set this explicitly to None if `fn` outputs already labelled data as a dict, :class:`~xarray.Dataset`, or :class:`~xarray.DataArray`. :type var_names: str, sequence of str, or None :param fn_args: The ordered name(s) of the input arguments(s) of `fn`. This is only needed if the cases or combos supplied are not dict-like. :type fn_args: str, or sequence of str, optional :param var_dims: Mapping of output variables to their named internal dimensions, can be the names of ``constants``. :type var_dims: dict-like, optional :param var_coords: Mapping of output variables named internal dimensions to the actual values they take. :type var_coords: dict-like, optional :param constants: Constants arguments to be supplied to `fn`. These can be used as 'var_dims', and will be saved as coords if so, otherwise as attributes. :type constants: dict-like, optional :param resources: Like `constants` but not saved to the the dataset, e.g. if very big. :type resources: dict-like, optional :param attrs: Any other miscelleous information to be saved with the dataset. :type attrs: dict-like, optional :param default_runner_settings: These keyword arguments will be supplied as defaults to any runner. .. py:attribute:: fn .. py:attribute:: _var_names :value: (None,) .. py:attribute:: _fn_args .. py:attribute:: _var_dims .. py:attribute:: _var_coords .. py:attribute:: _constants .. py:attribute:: _resources .. py:attribute:: _attrs .. py:attribute:: _last_ds :value: None .. py:attribute:: default_runner_settings .. py:method:: __call__(*args, **kwargs) .. py:method:: _get_fn_args() .. py:method:: _set_fn_args(fn_args) .. py:method:: _del_fn_args() .. py:attribute:: fn_args .. py:method:: _get_var_names() .. py:method:: _set_var_names(var_names) .. py:method:: _del_var_names() .. py:attribute:: var_names .. py:method:: _get_var_dims() .. py:method:: _set_var_dims(var_dims, var_names=None) .. py:method:: _del_var_dims() .. py:attribute:: var_dims .. py:method:: _get_var_coords() .. py:method:: _set_var_coords(var_coords) .. py:method:: _del_var_coords() .. py:attribute:: var_coords .. py:method:: _get_constants() .. py:method:: _set_constants(constants) .. py:method:: _del_constants() .. py:attribute:: constants .. py:method:: _get_resources() .. py:method:: _set_resources(resources) .. py:method:: _del_resources() .. py:attribute:: resources .. py:property:: last_ds .. py:method:: run_combos(combos, constants=(), **runner_settings) Run combos using the function map and save to dataset. :param combos: The values of each function argument with which to evaluate all combinations. :type combos: dict_like[str, iterable] :param constants: Extra constant arguments for this run, repeated arguments will take precedence over stored constants but for this run only. :type constants: dict, optional :param runner_settings: Keyword arguments supplied to :func:`~xyzpy.combo_runner`. .. py:method:: run_cases(cases, constants=(), fn_args=None, **runner_settings) Run cases using the function and save to dataset. :param cases: A sequence of cases. :type cases: sequence of mappings or tuples :param constants: Extra constant arguments for this run, repeated arguments will take precedence over stored constants but for this run only. :type constants: dict (optional) :param runner_settings: Supplied to :func:`~xyzpy.case_runner`. .. py:method:: Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None) Return a Crop instance with this runner, from which ``fn`` will be set, and then combos can be sown, grown, and reaped into the ``Runner.last_ds``. See :class:`~xyzpy.Crop`. :rtype: Crop .. py:method:: __repr__() .. py:class:: Sampler(runner, data_name=None, default_combos=None, full_df=None, engine='pickle') Like a Harvester, but randomly samples combos and writes the table of results to a ``pandas.DataFrame``. :param runner: Runner describing a labelled function to run. :type runner: xyzpy.Runner :param data_name: If given, the on-disk file to sync results with. :type data_name: str, optional :param default_combos: The default combos to sample from (which can be overridden). :type default_combos: dict_like[str, iterable], optional :param full_df: If given, use this dataframe as the initial 'full' data. :type full_df: pandas.DataFrame, optional :param engine: How to save and load the on-disk dataframe. See :func:`~xyzpy.manage.load_df` and :func:`~xyzpy.manage.save_df`. :type engine: {'pickle', 'csv', 'json', 'hdf', ...}, optional .. attribute:: full_df Dataframe describing all data harvested so far. :type: pandas.DataFrame .. attribute:: last_df Dataframe describing the data harvested on the previous run. :type: pandas.Dataframe .. py:attribute:: runner .. py:attribute:: data_name :value: None .. py:attribute:: default_combos .. py:attribute:: _full_df :value: None .. py:attribute:: _last_df :value: None .. py:attribute:: engine :value: 'pickle' .. py:property:: fn .. py:method:: load_full_df(engine=None) Load the on-disk full dataframe into memory. .. py:property:: full_df The dataframe describing all data harvested so far. .. py:property:: last_df The dataframe describing the last set of data harvested. .. py:method:: save_full_df(new_full_df=None, engine=None) Save `full_df` onto disk. :param new_full_df: Save this dataframe as the new full dataframe, else use the current ``full_df``. :type new_full_df: pandas.DataFrame, optional :param engine: Which engine to save the dataframe with, if None use the default. :type engine: str, optional .. py:method:: delete_df(backup=False) Delete the on-disk dataframe, optionally backing it up first. .. py:method:: add_df(new_df, sync=True, engine=None) Merge a new dataset into the in-memory full dataset. :param new_df: Data to be appended to the full dataset. :type new_df: pandas.DataFrame or dict :param sync: If True (default), load and save the disk dataframe before and after merging in the new data. :type sync: bool, optional :param engine: Which engine to save the dataframe with. :type engine: str, optional .. py:method:: gen_cases_fnargs(n, combos=None) .. py:method:: sample_combos(n, combos=None, engine=None, **case_runner_settings) Sample the target function many times, randomly choosing parameter combinations from ``combos`` (or ``SampleHarvester.default_combos``). :param n: How many samples to run. :type n: int :param combos: A mapping of function arguments to potential choices. Any keys in here will override ``default_combos``. You can also suppply a callable to manually return a random choice e.g. from a probability distribution. :type combos: dict_like[str, iterable], optional :param engine: Which method to use to sync with the on-disk dataframe. :type engine: str, optional :param case_runner_settings: Supplied to :func:`~xyzpy.case_runner` and so onto :func:`~xyzpy.combo_runner`. This includes ``parallel=True`` etc. .. py:method:: Crop(name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None) Return a Crop instance with this Sampler, from which `fn` will be set, and then samples can be sown, grown, and reaped into the ``Sampler.full_df``. See :class:`~xyzpy.Crop`. :rtype: Crop .. py:method:: __repr__() .. py:function:: cultivate(fn, *, var_names=None, data_name=None, runner_opts=None, harvester_opts=None, combos=None, cases=None, constants=None, name=None, parent_dir=None, batchsize=None, num_batches=None, missing_only=True, shuffle=True, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, log=False, raise_errors=True, verbosity=1, on_existing='ask', on_error='ask', clean_up=None, **grow_kwargs) Convenience function to run a full cycle of annotating a function, parsing combos into missing cases only, then persistently growing those cases, and finally merging the results into the full dataset. :param fn: The function to run over combos and cases. This will be wrapped in a :class:`~xyzpy.Runner` and :class:`~xyzpy.Harvester` to perform the cultivation process. If `var_names` is None, it should return a `dict`, :class:`~xarray.Dataset` or :class:`~xarray.DataArray`. :type fn: callable :param var_names: The ordered name(s) of the ouput variable(s) of `fn`. Set this explicitly to None if `fn` outputs already labelled data as a `dict`, :class:`~xarray.Dataset`, or :class:`~xarray.DataArray`. :type var_names: str, sequence of str, or None :param data_name: If given, the on-disk file to sync results with. If not set there will be no persistent results, since the harvester created in this functional interface is ephemeral. :type data_name: str, optional :param runner_opts: Keyword arguments to be supplied to :class:`~xyzpy.Runner`. :type runner_opts: dict, optional :param harvester_opts: Keyword arguments to be supplied to :class:`~xyzpy.Harvester`. :type harvester_opts: dict, optional :param combos: The combos to run. The only difference here is that you can supply an ellipse ``...``, meaning the all values for that coordinate will be loaded from the current full dataset. :type combos: dict_like[str, iterable] :param cases: A sequence of (partial) individual settings to run. For each case, all settings given by combos will be generated. :type cases: sequence of mappings or tuples, optional :param constants: Extra constant arguments for this run. :type constants: dict, optional :param name: Name for the crop to be used for on-disk storage of batches, results and logs. You can use different names to grow results for the same dataset concurrently. :type name: str, optional :param parent_dir: Parent directory in which to create the crop folder (``.xyz-{name}/``). Defaults to the current working directory. :type parent_dir: str, optional :param batchsize: If given, the target number of cases to sow in each batch. This is computed from ``num_batches`` if not given and 1 if neither given. :type batchsize: int, optional :param num_batches: If given, the target number of batches to sow. This is computed from ``batchsize`` if not given and 1 if neither given. :type num_batches: int, optional :param missing_only: If True (default), only run cases that are not already present in the on-disk dataset :type missing_only: bool, optional :param shuffle: If True (default), shuffle the order of cases before sowing and growing. This can be a useful basic form of load balancing. :type shuffle: bool, optional :param subprocess: Whether to grow each batch in a fresh subprocess. This adds about 1 second of overhead per batch, but allows the number of threads, cpu affinity and gpu assignment to be controlled. If "auto" (default) subprocesses are used when ``num_threads``, ``gpus`` or ``affinities`` are specified. See :meth:`xyzpy.Crop.grow` for details. :type subprocess: "auto" or bool, optional :param num_workers: Maximum number of batches to run concurrently. In subprocess mode this caps simultaneous subprocesses (defaults to 1 if not given). In in-process mode this is the joblib loky pool size (``None`` = serial). Forwarded to :meth:`xyzpy.Crop.grow`. :type num_workers: int, optional :param num_threads: Number of threads each worker is allowed to use, applied via the standard env vars (``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, etc.) in each subprocess. Implies ``subprocess=True`` when ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`. :type num_threads: int, optional :param gpus: GPU device IDs to assign to subprocesses via ``CUDA_VISIBLE_DEVICES``; the pool also caps concurrency. Implies ``subprocess=True`` when ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`. :type gpus: int, str, or sequence of int, optional :param affinities: CPU core IDs to pin subprocesses to via ``taskset``; the pool also caps concurrency. Implies ``subprocess=True`` when ``subprocess="auto"``. Forwarded to :meth:`xyzpy.Crop.grow`. :type affinities: int, str, or sequence of int, optional :param log: Whether to save subprocess stdout and stderr to files in the crop directory under ``logs/batch-{batch_id}.log``. Subprocess-mode only. Forwarded to :meth:`xyzpy.Crop.grow`. :type log: bool, optional :param raise_errors: If True (default), raise any errors that occur during growing, otherwise just log them and continue with the next batch. :type raise_errors: bool, optional :param verbosity: The level of logging to print during the sow/grow/reap process. 0: no output, 1: progress bars, 2: progress bars with current setting postfixed. :type verbosity: int, optional :param on_existing: What to do if a crop with the same name already exists on disk. Default is ``'ask'`` (interactive prompt). :type on_existing: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional :param on_error: What to do if an error occurs during grow/reap. Default is ``'ask'`` (interactive prompt). :type on_error: {'ask', 'reap', 'delete', 'skip', 'raise'}, optional :param clean_up: Whether to delete the on-disk batch, result and log files after successfully reaping. :type clean_up: bool or None, optional :param grow_kwargs: Further keyword arguments forwarded to :meth:`xyzpy.Crop.grow` (e.g. ``executor``, ``min_wait``, ...). .. seealso:: :py:obj:`Harvester.cultivate`, :py:obj:`xyzpy.Crop.grow` .. py:function:: label(var_names, fn_args=None, var_dims=None, var_coords=None, constants=None, resources=None, attrs=None, harvester=False, sampler=False, engine=None, **default_runner_settings) Convenient decorator to automatically wrap a function as a :class:`~xyzpy.Runner` or :class:`~xyzpy.Harvester`. :param var_names: The ordered name(s) of the ouput variable(s) of `fn`. Set this explicitly to None if `fn` outputs already labelled data as a `dict`, :class:`~xarray.Dataset`, or :class:`~xarray.DataArray`. :type var_names: str, sequence of str, or None :param fn_args: The ordered name(s) of the input arguments(s) of `fn`. This is only needed if the cases or combos supplied are not dict-like. :type fn_args: str, or sequence of str, optional :param var_dims: Mapping of output variables to their named internal dimensions, can be the names of ``constants``. :type var_dims: dict-like, optional :param var_coords: Mapping of output variables named internal dimensions to the actual values they take. :type var_coords: dict-like, optional :param constants: Constants arguments to be supplied to `fn`. These can be used as 'var_dims', and will be saved as coords if so, otherwise as attributes. :type constants: dict-like, optional :param resources: Like `constants` but not saved to the the dataset, e.g. if very big. :type resources: dict-like, optional :param attrs: Any other miscelleous information to be saved with the dataset. :type attrs: dict-like, optional :param harvester: If ``True``, wrap the runner as a :class:`~xyzpy.Harvester`, if a string, create the harvester with that as the ``data_name``. :type harvester: bool or str, optional :param default_runner_settings: These keyword arguments will be supplied as defaults to any runner. .. rubric:: Examples Declare a function as a runner directly:: >>> import xyzpy as xyz >>> @xyz.label(var_names=['sum', 'diff']) ... def foo(x, y): ... return x + y, x - y ... >>> foo fn: fn_args: ('x', 'y') var_names: ('sum', 'diff') var_dims: {'sum': (), 'diff': ()} >>> foo(1, 2) # can still call it normally (3, -1) .. py:class:: RayExecutor(*args, default_remote_opts=None, **kwargs) Basic ``concurrent.futures`` like interface using ``ray``. Example usage:: from xyzpy import RayExecutor # create a pool that by default requests a single gpu per task pool = RayExecutor( num_cpus=4, num_gpus=4, default_remote_opts={"num_gpus": 1}, ) .. py:attribute:: default_remote_opts .. py:method:: _maybe_inject_remote_opts(remote_opts=None) Return the default remote options, possibly overriding some with those supplied by a ``submit call``. .. py:method:: submit(fn, *args, pure=False, remote_opts=None, **kwargs) Remotely run ``fn(*args, **kwargs)``, returning a ``RayFuture``. .. py:method:: map(func, *iterables, remote_opts=None) Remote map ``func`` over arguments ``iterables``. .. py:method:: scatter(data) Push ``data`` into the distributed store, returning an ``ObjectRef`` that can be supplied to ``submit`` calls for example. .. py:method:: shutdown() Shutdown the parent ray cluster, this ``RayExecutor`` instance itself does not need any cleanup. .. py:class:: RayGPUExecutor(*args, gpus_per_task=1, **kwargs) Bases: :py:obj:`RayExecutor` A ``RayExecutor`` that by default requests a single gpu per task. .. py:function:: auto_xyz_ds(x, y_z=None) Automatically turn an array into a `xarray` dataset. Transpose ``y_z`` if necessary to automatically match dimension sizes. :param x: The x-coordinates. :type x: array_like :param y_z: The y-data, possibly varying with coordinate z. :type y_z: array_like, optional .. py:function:: cache_to_disk(fn=None, *, cachedir=_DEFAULT_FN_CACHE_PATH, **kwargs) Cache this function to disk, using joblib. .. py:function:: check_runs(obj, dim='run', var=None, sel=()) Print out information about the range and any missing values for an integer dimension. :param obj: Data to check. :type obj: xarray object :param dim: Dimension to check, defaults to 'run'. :type dim: str (optional) :param var: Subselect this data variable first. :type var: str (optional) :param sel: Subselect these other coordinates first. :type sel: mapping (optional) .. py:function:: load_df(name, engine='pickle', key='df', **kwargs) Load a dataframe from disk. :param name: File name to read from. :type name: str :param engine: Storage backend. :type engine: {'pickle', 'csv', 'hdf'}, optional :param key: HDF key when ``engine='hdf'``. :type key: str, optional :param \*\*kwargs: Passed through to the pandas reader. :returns: Loaded dataframe. :rtype: pandas.DataFrame .. py:function:: load_ds(file_name, engine='h5netcdf', load_to_mem=None, create_new=False, chunks=None, **kwargs) Loads a xarray dataset. Basically ``xarray.open_dataset`` with some different defaults and convenient behaviour. :param file_name: Name of file to open. :type file_name: str :param engine: Engine used to load file. :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional :param load_to_mem: Ince opened, load from disk into memory. Defaults to ``True`` if ``chunks=None``. :type load_to_mem: bool, optional :param create_new: If no file exists make a blank one. :type create_new: bool, optional :param chunks: Passed to ``xarray.open_dataset`` so that data is stored using ``dask.array``. :type chunks: int or dict :returns: **ds** -- Loaded Dataset. :rtype: xarray.Dataset .. py:function:: merge_sync_conflict_datasets(base_name, engine='h5netcdf', combine_first=False) Glob files based on `base_name`, merge them, save this new dataset if it contains new info, then clean up the conflicts. :param base_name: Base file name to glob on - should include '*'. :type base_name: str :param engine: Load and save engine used by xarray. :type engine: str , optional :param combine_first: If True, combine datasets sequentially using ``combine_first``, preferring the first dataset in the list, which is assumed to be the original. If False, merge all datasets together using ``xr.merge``, which will raise an error if there are any conflicts. :type combine_first: bool, optional .. py:function:: save_df(df, name, engine='pickle', key='df', **kwargs) Save a dataframe to disk. :param df: DataFrame to save. :type df: pandas.DataFrame :param name: File name to save to. :type name: str :param engine: Storage backend. :type engine: {'pickle', 'csv', 'hdf'}, optional :param key: HDF key when ``engine='hdf'``. :type key: str, optional :param \*\*kwargs: Passed through to the pandas writer. .. py:function:: save_ds(ds, file_name, engine='h5netcdf', **kwargs) Saves a xarray dataset. :param ds: The dataset to save. :type ds: xarray.Dataset :param file_name: Name of the file to save to. :type file_name: str :param engine: Engine used to save file with. :type engine: {'h5netcdf', 'netcdf4', 'joblib', 'zarr'}, optional :rtype: None .. py:function:: save_merge_ds(ds, fname, overwrite=None, **kwargs) Save dataset ``ds``, but check for an existing dataset with that name first, and if it exists, merge the two before saving. :param ds: The dataset to save. :type ds: xarray.Dataset :param fname: The file name. :type fname: str :param overwrite: How to merge the dataset with the existing dataset. - None: the datasets will be merged in there are no conflicts - False: data will be taken from old dataset if conflicting - True: data will be taken from new dataset if conflicting :type overwrite: {None, False, True}, optional .. py:function:: sort_dims(ds) Reorder variable dimensions to match ``ds.dims``. This is an inplace operation. :param ds: Dataset to reorder in place. :type ds: xarray.Dataset :rtype: None .. py:function:: trimna(obj) Drop values across dims where all values are NaN. :param obj: Object to trim. :type obj: xarray.Dataset or xarray.DataArray :returns: Trimmed object. :rtype: same type as obj .. py:function:: cimluv(hue, hue_shift=0.0, sat1=1.0, sat2=0.5, val1=0.8, val2=0.3, N=30, reverse=False) Creates a color map for single hue, using HSLuv color space. .. py:function:: cimple(hue, sat1=0.4, sat2=1.0, val1=0.95, val2=0.35, hue_shift=0.0, name='cimple', auto_adjust_sat=0.2) Creates a color map for a single hue. .. py:function:: cimple_bright(hue, sat1=0.8, sat2=0.9, val1=0.97, val2=0.3, hue_shift=0.0, name='cimple_bright') Creates a color map for a single hue, with bright defaults. .. py:function:: cmoke(hue, hue_shift=0.0, sat1=0.36, sat2=0.5, val1=0.38, val2=0.93, N=51, reverse=False) Creates a color map for single hue, using OKLCH color space. .. py:function:: convert_colors(cols, outformat, informat='MATPLOTLIB') Convert lists of colors between formats .. py:function:: get_neutral_style(draw_color=(0.5, 0.5, 0.5)) .. py:function:: infiniplot(ds, x, y=None, z=None, **kwargs) Helper class for the infiniplot functionality. :param ds: Dataset to plot. :type ds: xarray.Dataset :param x: Name of the x coordinate. :type x: str :param y: Name of the y coordinate. If not specified, histogram mode is activated and the values of ``x`` are binned to produce a density or frequency to use as the y-variable. :type y: str, optional :param z: Name of the z coordinate. If specified this turns on the heatmap mode. :type z: str, optional :param bins: If in histogram mode, specify either the number of bins to use or the bin edges. If not specified, a default number of bins is automatically chosen based on the number of data points. :type bins: int or array_like, optional :param bins_density: If in histogram mode, whether to plot the density (True) or frequency (False) of the data. Default is True. :type bins_density: bool, optional :param aggregate: If specified, aggregate over the given dimension(s) using ``aggregate_method`` (by default 'median'). If `True` aggregate over all unmapped dimensions. If in heatmap mode, this is automatically set to `True`, since only one plot can be shown per axis. :type aggregate: str or Sequence[str], optional :param aggregate_method: If ``aggregate`` is specified, the method to use for aggregation. Any option available as a method on a DataArray can be used, e.g. 'mean', 'median', 'max'. Default is 'median'. :type aggregate_method: str, optional :param aggregate_err_range: If ``aggregate`` is specified, the range of the error bars or bands to show. The options are: - ``'std'``: show the standard deviation of the data - ``'stderr'``: show the standard error of the mean - float: show the given quantile range, e.g. 0.5 for the interquartile range :type aggregate_err_range: float or str, optional :param err: If specified, a data variable to use for error bars or bands. This overrides any derived from ``aggregate``. :type err: str, optional :param err_style: If specified, the style of error to show. The options are: - ``'bars'``: show error bars - ``'band'``: show error bands :type err_style: str, optional :param err_kws: Additional keyword arguments to pass to the error plotting function. :type err_kws: dict, optional :param xlink: If specified, the name of a dimension to use for linking the x-axis. Used when you are plotting a variable rather than coordinate as ``x``, but want to link each sweep of values as a line. :type xlink: str, optional :param color: If specified, the name of a dimension to use for mapping the color or intensity of each line. If ``hue`` is also specified, this controls the intensity of the color. If not a dimension, this is used as a constant color for all lines. :type color: str, optional :param colors: An explicit sequence of colors to use for the color-mapped dimension. :type colors: sequence, optional :param color_order: An explicit order of values to use for the color-mapped dimension. :type color_order: sequence, optional :param color_label: An alternate label to use for the color-mapped dimension. :type color_label: str, optional :param color_ticklabels: A mapping from values to tick labels to use for the color-mapped dimension. :type color_ticklabels: dict or sequence, optional :param colormap_start: If using a palette, the starting value of the colormap to use, e.g. 0.2 would skip the first 20% of the colormap. :type colormap_start: float, optional :param colormap_stop: If using a palette, the stopping value of the colormap to use, e.g. 0.9 would skip the last 10% of the colormap. :type colormap_stop: float, optional :param hue: If specified, the name of a dimension to use for mapping the color or hue of each line. If ``color`` is also specified, this controls the hue of the color. If not a dimension, this is used as a constant hue for all lines. :type hue: str, optional :param hues: An explicit sequence of hues to use for the hue-mapped dimension. :type hues: sequence, optional :param hue_order: An explicit order of values to use for the hue-mapped dimension. :type hue_order: sequence, optional :param hue_label: An alternate label to use for the hue-mapped dimension. :type hue_label: str, optional :param hue_ticklabels: A mapping from values to tick labels to use for the hue-mapped dimension. :type hue_ticklabels: dict or sequence, optional :param palette: If specified, the name of a colormap, or an actual colormap, to use for mapping the color or hue of each line. If both ``color`` and ``hue`` are specified, you can supply a sequence of palettes here, with ``hue`` controlling which palette, and ``color`` controlling the intensity within the palette. :type palette: str, sequence, or colormap, optional :param autohue_start: If not using a palette, the starting hue to use for automatically generating a sequence of hues. :type autohue_start: float, optional :param autohue_sweep: If not using a palette, the sweep of hues to use for automatically generating a sequence of hues. :type autohue_sweep: float, optional :param autohue_opts: Additional keyword arguments to pass to the automatic hue generator - see {func}`xyzpy.color.cmoke`. :type autohue_opts: dict, optional :param marker: If specified, the name of a dimension to use for mapping the marker style of each line. If not a dimension, this is used as a constant marker style for all lines. :type marker: str, optional :param markers: An explicit sequence of markers to use for the marker-mapped dimension. :type markers: sequence, optional :param marker_order: An explicit order of values to use for the marker-mapped dimension. :type marker_order: sequence, optional :param marker_label: An alternate label to use for the marker-mapped dimension. :type marker_label: str, optional :param marker_ticklabels: A mapping from values to tick labels to use for the marker-mapped dimension. :type marker_ticklabels: dict or sequence, optional :param markersize: If specified, the name of a dimension to use for mapping the marker size of each line. If not a dimension, this is used as a constant marker size for all lines. :type markersize: str, optional :param markersizes: An explicit sequence of marker sizes to use for the markersize-mapped dimension. :type markersizes: sequence, optional :param markersize_order: An explicit order of values to use for the markersize-mapped dimension. :type markersize_order: sequence, optional :param markersize_label: An alternate label to use for the markersize-mapped dimension. :type markersize_label: str, optional :param markersize_ticklabels: A mapping from values to tick labels to use for the markersize-mapped dimension. :type markersize_ticklabels: dict or sequence, optional :param markeredgecolor: If specified, the name of a dimension to use for mapping the marker edge color of each line. If not a dimension, this is used as a constant marker edge color for all lines. :type markeredgecolor: str, optional :param markeredgecolors: An explicit sequence of marker edge colors to use for the markeredgecolor-mapped dimension. :type markeredgecolors: sequence, optional :param markeredgecolor_order: An explicit order of values to use for the markeredgecolor-mapped dimension. :type markeredgecolor_order: sequence, optional :param markeredgecolor_label: An alternate label to use for the markeredgecolor-mapped dimension. :type markeredgecolor_label: str, optional :param markeredgecolor_ticklabels: A mapping from values to tick labels to use for the markeredgecolor-mapped dimension. :type markeredgecolor_ticklabels: dict or sequence, optional :param linewidth: If specified, the name of a dimension to use for mapping the line width of each line. If not a dimension, this is used as a constant line width for all lines. :type linewidth: str, optional :param linewidths: An explicit sequence of line widths to use for the linewidth-mapped dimension. :type linewidths: sequence, optional :param linewidth_order: An explicit order of values to use for the linewidth-mapped dimension. :type linewidth_order: sequence, optional :param linewidth_label: An alternate label to use for the linewidth-mapped dimension. :type linewidth_label: str, optional :param linewidth_ticklabels: A mapping from values to tick labels to use for the linewidth-mapped dimension. :param linestyle: If specified, the name of a dimension to use for mapping the line style of each line. If not a dimension, this is used as a constant line style for all lines. :type linestyle: str, optional :param linestyles: An explicit sequence of line styles to use for the linestyle-mapped dimension. :type linestyles: sequence, optional :param linestyle_order: An explicit order of values to use for the linestyle-mapped dimension. :type linestyle_order: sequence, optional :param linestyle_label: An alternate label to use for the linestyle-mapped dimension. :type linestyle_label: str, optional :param linestyle_ticklabels: A mapping from values to tick labels to use for the linestyle-mapped dimension. :type linestyle_ticklabels: dict or sequence, optional :param text: If specified, the name of a dimension to use for mapping text annotations to each line. :type text: str, optional :param text_formatter: A function to use to format data entries to text annotations. Default is ``str``. :type text_formatter: callable, optional :param text_opts: Additional keyword arguments to pass to the text plotting function. :type text_opts: dict, optional :param col: If specified, the name of a dimension to use for mapping the subplot column of each line. :type col: str, optional :param col_order: An explicit order of values to use for the col-mapped dimension. :type col_order: sequence, optional :param col_label: An alternate label to use for the col-mapped dimension. :type col_label: str, optional :param col_ticklabels: A mapping from values to tick labels to use for the col-mapped dimension. :type col_ticklabels: dict or sequence, optional :param row: If specified, the name of a dimension to use for mapping the subplot row of each line. :type row: str, optional :param row_order: An explicit order of values to use for the row-mapped dimension. :type row_order: sequence, optional :param row_label: An alternate label to use for the row-mapped dimension. :type row_label: str, optional :param row_ticklabels: A mapping from values to tick labels to use for the row-mapped dimension. :type row_ticklabels: dict or sequence, optional :param alpha: Global alpha value to use for all lines. :type alpha: float, optional :param join_across_missing: If True, join lines across missing (NaN) data. Default is False. :type join_across_missing: bool, optional :param err_band_alpha: Alpha value to use for error bands. :type err_band_alpha: float, optional :param err_bar_capsize: Size of the caps on error bars. :type err_bar_capsize: float, optional :param xlabel: Alternate label to use for the x-axis. :type xlabel: str, optional :param ylabel: Alternate label to use for the y-axis. :type ylabel: str, optional :param xlim: Limits to use for the x-axis. :type xlim: tuple, optional :param ylim: Limits to use for the y-axis. :type ylim: tuple, optional :param xscale: Scale to use for the x-axis, e.g. 'log'. :type xscale: str, optional :param yscale: Scale to use for the y-axis, e.g. 'log'. :type yscale: str, optional :param zscale: Scale to use for a heatmap color dimension, e.g. 'log'. :type zscale: str, optional :param xbase: If ``xscale=='log'``, the log base to use for the x-axis. :type xbase: float, optional :param ybase: If ``yscale=='log'``, the log base to use for the y-axis. :type ybase: float, optional :param xticks: Manual sequence of x-values to use for ticks. :type xticks: sequence[float], optional :param yticks: Manual sequence of y-values to use for ticks. :type yticks: sequence[float], optional :param xticklabels: Manual sequence of x-tick labels to use, requires and should be the same length as ``xticks``. :type xticklabels: sequence[str], optional :param yticklabels: Manual sequence of y-tick labels to use, requires and should be the same length as ``yticks``. :type yticklabels: sequence[str], optional :param vspans: Sequence of x-values to use for vertical spans. :type vspans: sequence[float], optional :param hspans: Sequence of y-values to use for horizontal spans. :type hspans: sequence[float], optional :param span_color: Color to use for spans. :type span_color: str or tuple, optional :param span_alpha: Alpha value to use for spans. :type span_alpha: float, optional :param span_linewidth: Line width to use for spans. :type span_linewidth: float, optional :param span_linestyle: Line style to use for spans. :type span_linestyle: str, optional :param grid: Whether to show grid lines. :type grid: bool, optional :param grid_which: Which grid lines to show, either 'major' or 'minor'. :type grid_which: str, optional :param grid_alpha: Alpha value to use for grid lines. :type grid_alpha: float, optional :param legend: Whether to show a legend. :type legend: bool, optional :param legend_ncol: Number of columns to use for the legend. :type legend_ncol: int, optional :param legend_merge: If ``True``, combinations of different mapped properties are merged into list of every combination. :type legend_merge: bool, optional :param legend_reverse: If ``True``, reverse the order of the legend entries. :type legend_reverse: bool, optional :param legend_entries: An explicit sequence of legend entries to use. :type legend_entries: sequence, optional :param legend_labels: An explicit sequence of legend labels to use. :type legend_labels: sequence, optional :param legend_extras: An explicit sequence of extra legend items to add. :type legend_extras: sequence, optional :param legend_opts: Additional keyword arguments to pass to the legend plotting function. :type legend_opts: dict, optional :param title: A title to use for the plot. :type title: str, optional :param axs: An explicit array of axes to use for the plot, it should have at least as many rows and columns as there are mapped dimensions. :type axs: sequence[sequence[matplotlib.Axes]], optional :param ax: Shortcut for supplying a single axes to use for the plot, can only supply if there is a single row and column. :type ax: matplotlib.Axes, optional :param format_axs: Whether to format the axes to use the neutral xyzpy style. :type format_axs: bool, optional :param figsize: Size of the figure to use if creating one (ax is axs is None). If not specified it is automatically computed based on the number of rows and columns. :type figsize: tuple, optional :param height: Height of each subplot. Default is 3. :type height: float, optional :param width: Width of each subplot. If not specified, it is automatically set to match ``height``. Default is None. :type width: float, optional :param hspace: Spacing between subplots vertically. Default is 0.12. :type hspace: float, optional :param wspace: Spacing between subplots horizontally. Default is 0.12. :type wspace: float, optional :param sharex: Whether to share the x-axis between subplots. Default is True. :type sharex: bool, optional :param sharey: Whether to share the y-axis between subplots. Default is True. :type sharey: bool, optional :param kwargs: Additional keyword arguments to pass to the main plotting function. :type kwargs: dict, optional :returns: * **fig** (*matplotlib.Figure*) -- Figure containing the plot (None if ``ax`` or ``axs`` is specified). * **axs** (*sequence[sequence[matplotlib.Axes]]*) -- Array of axes containing the plot. .. py:function:: neutral_style(draw_color=(0.5, 0.5, 0.5), **kwargs) .. py:function:: auto_iheatmap(x, **iheatmap_opts) Auto version of :func:`~xyzpy.iheatmap` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: auto_ilineplot(x, y_z, **lineplot_opts) Auto version of :func:`~xyzpy.ilineplot` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: auto_iscatter(x, y_z, **iscatter_opts) Auto version of :func:`~xyzpy.iscatter` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: iheatmap(ds, x, y, z, **kwargs) From ``ds`` plot variable ``z`` as a function of ``x`` and ``y`` using a 2D heatmap. Interactive, :param ds: Dataset to plot from. :type ds: xarray.Dataset :param x: Dimension to plot along the x-axis. :type x: str :param y: Dimension to plot along the y-axis. :type y: str :param z: Variable to plot as colormap. :type z: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:function:: ilineplot(ds, x, y, z=None, y_err=None, x_err=None, **kwargs) From ``ds`` plot lines of ``y`` as a function of ``x``, optionally for varying ``z``. Interactive, :param ds: Dataset to plot from. :type ds: xarray.Dataset :param x: Dimension to plot along the x-axis. :type x: str :param y: Variable(s) to plot along the y-axis. If tuple, plot each of the variables - instead of ``z``. :type y: str or tuple[str] :param z: Dimension to plot into the page. :type z: str, optional :param y_err: Variable to plot as y-error. :type y_err: str, optional :param x_err: Variable to plot as x-error. :type x_err: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:function:: iscatter(ds, x, y, z=None, y_err=None, x_err=None, **kwargs) From ``ds`` plot a scatter of ``y`` against ``x``, optionally for varying ``z``. Interactive. :param ds: Dataset to plot from. :type ds: xarray.Dataset :param x: Quantity to plot along the x-axis. :type x: str :param y: Quantity(s) to plot along the y-axis. If tuple, plot each of the variables - instead of ``z``. :type y: str or tuple[str] :param z: Dimension to plot into the page. :type z: str, optional :param y_err: Variable to plot as y-error. :type y_err: str, optional :param x_err: Variable to plot as x-error. :type x_err: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:class:: AutoHeatMap(x, **heatmap_opts) Bases: :py:obj:`HeatMap` .. py:class:: AutoHistogram(x, **histogram_opts) Bases: :py:obj:`Histogram` .. py:class:: AutoLinePlot(x, y_z, **lineplot_opts) Bases: :py:obj:`LinePlot` .. py:class:: AutoScatter(x, y_z, **scatter_opts) Bases: :py:obj:`Scatter` .. py:class:: HeatMap(ds, x, y, z, **kwargs) Bases: :py:obj:`PlotterMatplotlib`, :py:obj:`xyzpy.plot.core.AbstractHeatMap` .. py:method:: plot_heatmap() Plot the data as a heatmap. .. py:method:: __call__() .. py:class:: Histogram(ds, x, z=None, **kwargs) Bases: :py:obj:`PlotterMatplotlib`, :py:obj:`xyzpy.plot.core.AbstractHistogram` .. py:method:: plot_histogram() .. py:method:: __call__() .. py:class:: LinePlot(ds, x, y, z=None, *, y_err=None, x_err=None, **kwargs) Bases: :py:obj:`PlotterMatplotlib`, :py:obj:`xyzpy.plot.core.AbstractLinePlot` .. py:method:: plot_lines() .. py:method:: __call__() .. py:class:: Scatter(ds, x, y, z=None, **kwargs) Bases: :py:obj:`PlotterMatplotlib`, :py:obj:`xyzpy.plot.core.AbstractScatter` .. py:method:: plot_scatter() .. py:method:: __call__() .. py:function:: auto_heatmap(x, **heatmap_opts) Auto version of :func:`~xyzpy.heatmap` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: auto_histogram(x, **histogram_opts) Auto version of :func:`~xyzpy.histogram` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: auto_lineplot(x, y_z, **lineplot_opts) Auto version of :func:`~xyzpy.lineplot` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: auto_scatter(x, y_z, **scatter_opts) Auto version of :func:`~xyzpy.scatter` that accepts array arguments by converting them to a ``Dataset`` first. .. py:function:: heatmap(ds, x, y, z, **kwargs) From ``ds`` plot variable ``z`` as a function of ``x`` and ``y`` using a 2D heatmap. :param ds: Dataset to plot from. :type ds: xarray.Dataset :param x: Dimension to plot along the x-axis. :type x: str :param y: Dimension to plot along the y-axis. :type y: str :param z: Variable to plot as colormap. :type z: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:function:: histogram(ds, x, z=None, **plot_opts) Dataset histogram. :param ds: The dataset to plot. :type ds: xarray.Dataset :param x: The variable(s) to plot the probability density of. If sequence, plot a histogram of each instead of using a ``z`` coordinate. :type x: str, sequence of str :param z: If given, range over this coordinate a plot a histogram for each. :type z: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:function:: lineplot(ds, x, y, z=None, y_err=None, x_err=None, **plot_opts) From ``ds`` plot lines of ``y`` as a function of ``x``, optionally for varying ``z``. :param ds: Dataset to plot from. :type ds: xarray.Dataset :param x: Dimension to plot along the x-axis. :type x: str :param y: Variable(s) to plot along the y-axis. If tuple, plot each of the variables - instead of ``z``. :type y: str or tuple[str] :param z: Dimension to plot into the page. :type z: str, optional :param y_err: Variable to plot as y-error. :type y_err: str, optional :param x_err: Variable to plot as x-error. :type x_err: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:function:: scatter(ds, x, y, z=None, y_err=None, x_err=None, **plot_opts) From ``ds`` plot a scatter of ``y`` against ``x``, optionally for varying ``z``. :param ds: Dataset to plot from. :type ds: xarray.Dataset :param x: Quantity to plot along the x-axis. :type x: str :param y: Quantity(s) to plot along the y-axis. If tuple, plot each of the variables - instead of ``z``. :type y: str or tuple[str] :param z: Dimension to plot into the page. :type z: str, optional :param y_err: Variable to plot as y-error. :type y_err: str, optional :param x_err: Variable to plot as x-error. :type x_err: str, optional :param row: Dimension to vary over as a function of rows. :type row: str, optional :param col: Dimension to vary over as a function of columns. :type col: str, optional :param plot_opts: See ``xyzpy.plot.core.PLOTTER_DEFAULTS``. .. py:function:: visualize_matrix(array, max_mag=None, magscale='linear', alpha_map=True, alpha_pow=1 / 2, legend=True, legend_loc='auto', legend_size=0.15, legend_bounds=None, legend_resolution=3, facecolor=None, rasterize=4096, rasterize_dpi=300, figsize=(5, 5), ax=None) Visualize ``array`` as a 2D colormapped image. :param array: A 2D (or 1D) array or sequence of arrays to visualize. :type array: array_like or Sequence[array_like] :param max_mag: The maximum magnitude to use for the color mapping. If not provided, the maximum magnitude in the array will be used. :type max_mag: float, optional :param magscale: How to scale the magnitude of the array values. If "linear", then the magnitude is used directly. If a float, then the magnitude is raised to this power before being used, which can help to show variation among small values. :type magscale: "linear" or float, optional :param alpha_map: Whether to map the tensor value magnitudes to pixel alpha. :type alpha_map: bool, optional :param alpha_pow: The power to raise the magnitude to when mapping to alpha. :type alpha_pow: float, optional :param legend: Whether to show a legend (colorbar). If the array has complex dtype then the legend will be a colorwheel. :type legend: bool, optional :param legend_loc: Where to place the legend. If "auto", then the legend will be placed outside the plot rectangle, otherwise it should be a tuple of ``(x, y)`` coordinates in axes space. :type legend_loc: str or tuple[float], optional :param legend_size: The size of the legend, in relation to the size of the plot axes. :type legend_size: float, optional :param legend_bounds: The bounds of the legend, as ``(x, y, width, height)`` in axes space. If not provided, the bounds will be computed from ``legend_loc`` and ``legend_size``. :type legend_bounds: tuple[float], optional :param legend_resolution: The number of different colors to show in the legend. :type legend_resolution: int, optional :param facecolor: The background color of the plot, by default transparent. :type facecolor: str, optional :param rasterize: Whether to rasterize the plot. If a number, then rasterize if the number of pixels in the plot is greater than this value. :type rasterize: int or float, optional :param rasterize_dpi: The dpi to use when rasterizing. :type rasterize_dpi: float, optional :param figsize: The size of the figure to create, if ``ax`` is not provided. :type figsize: tuple[float], optional :param ax: The axis to draw to. If not provided, a new figure will be created. :type ax: matplotlib.Axis, optional :param show_and_close: If ``True`` (the default) then show and close the figure, otherwise return the figure and axis. :type show_and_close: bool, optional :returns: * **fig** (*matplotlib.Figure*) -- The figure containing the plot, or ``None`` if ``ax`` was provided. * **ax** (*matplotlib.Axis*) -- The axis or axes containing the plot(s). .. py:function:: visualize_tensor(array, spacing_factor=1.0, max_projections=None, projection_overlap_spacing=1.05, angles=None, scales=None, skew_angle_factor='auto', skew_scale_factor=0.05, max_mag=None, magscale='linear', size_map=True, size_pow=1 / 2, size_scale=1.0, alpha_map=True, alpha_pow=1 / 2, alpha=0.8, marker='o', linewidths=0, show_lattice=True, lattice_opts=None, compass=False, compass_loc='auto', compass_size=0.1, compass_bounds=None, compass_labels=None, compass_opts=None, legend=True, legend_loc='auto', legend_size=0.15, legend_bounds=None, legend_resolution=3, interleave_projections=False, reverse_projections=False, facecolor=None, rasterize=4096, rasterize_dpi=300, figsize=(5, 5), ax=None) Visualize all entries of a tensor, with indices mapped into the plane and values mapped into a color wheel. :param array: The tensor to visualize. :type array: numpy.ndarray :param spacing_factor: How to scale the dimensions relative to each other. If 1.0, then each dimension will have the same extent, and smaller dimensions will be sparser. If 0.0, the each dimension will have an extent propoertional to its size, with matching density. :type spacing_factor: float, optional :param max_projections: The maximum number of different projection directions / angles to use. If specified and less than the number of dimensions, then multiple dimensions will share the same angle but with different scales. :type max_projections: int, optional :param projection_overlap_spacing: When grouping multiple dimensions to the same angle, how much to increase the spacing at each scale so as to emphasize each. :type projection_overlap_spacing: float, optional :param angles: An explicit list of angles to use for each direction, in radians, with zero pointing straight down. If not provided, then the angles will be calculated automatically. :type angles: sequence[float], optional :param scales: An explicit list of scales to use for each direction. If not provided, then the scales will be calculated automatically. :type scales: sequence[float], optional :param skew_angle_factor: When there are more than two dimensions, a factor to scale the rotations by to avoid overlapping data points. If 0.0 then the angles will be evenly spaced. :type skew_angle_factor: float, optional :param skew_scale_factor: When there are more than two dimensions, a factor to scale the scales by to avoid overlapping data points, that shortens non-perpendicular directions. :type skew_scale_factor: float, optional :param max_mag: The maximum magnitude to use for the color mapping. If not provided, the maximum magnitude in the array will be used. :type max_mag: float, optional :param magscale: How to scale the magnitude of the array values. If "linear", then the magnitude is used directly. If a float, then the magnitude is raised to this power before being used, which can help to show variation among small values. :type magscale: "linear" or float, optional :param size_map: Whether to map the tensor value magnitudes to marker size. :type size_map: bool, optional :param size_scale: An overall factor to scale the marker size by. :type size_scale: float, optional :param alpha_map: Whether to map the tensor value magnitudes to marker alpha. :type alpha_map: bool, optional :param alpha_pow: The power to raise the magnitude to when mapping to alpha. :type alpha_pow: float, optional :param alpha: The overall alpha to use for all markers if ``not alpha_map``. :type alpha: float, optional :param marker: The marker to use for the markers. :type marker: str, optional :param linewidths: The linewidth to use for the markers. :type linewidths: float, optional :param show_lattice: Show a thin grey line connecting adjacent array coordinate points. :type show_lattice: bool, optional :param lattice_opts: Options to pass to ``maplotlib.Axis.scatter`` for the lattice grid. :type lattice_opts: dict, optional :param compass: Whether to show a compass indicating the orientation of each dimension. :type compass: bool, optional :param compass_loc: Where to place the compass. :type compass_loc: (float, float), optional :param compass_size: The size of the compass. :type compass_size: float, optional :param compass_bounds: Explicit bounds of the compass, as ``(x, y, width, height)``. :type compass_bounds: tuple[float], optional :param compass_labels: Explicit labels for the compass, in order of the dimensions. :type compass_labels: sequence[str], optional :param compass_opts: Extra options for the compass arrows. :type compass_opts: dict, optional :param legend: Whether to show a legend (colorbar). If the array has complex dtype then the legend will be a colorwheel. :type legend: bool, optional :param legend_loc: Where to place the legend. If "auto", then the legend will be placed outside the plot rectangle, otherwise it should be a tuple of ``(x, y)`` coordinates in axes space. :type legend_loc: str or tuple[float], optional :param legend_size: The size of the legend, in relation to the size of the plot axes. :type legend_size: float, optional :param legend_bounds: Explicit bounds of the legend, as ``(x, y, width, height)`` in axes space. :type legend_bounds: tuple[float], optional :param legend_resolution: The number of different colors to show in the legend. :type legend_resolution: int, optional :param interleave_projections: If ``True`` and grouping dimensions, then they are assigned round robin fashion rather than blocks. ``False`` matches the behavior of fusing. :type interleave_projections: bool, optional :param reverse_projections: Whether to reverse the order of the projections. :type reverse_projections: bool, optional :param facecolor: The background color of the plot, by default transparent. :type facecolor: str, optional :param rasterize: Whether to rasterize the plot. If a number, then rasterize if the size of the array is greater than this value. :type rasterize: int or float, optional :param rasterize_dpi: The dpi to use when rasterizing. :type rasterize_dpi: float, optional :param figsize: The size of the figure to create, if ``ax`` is not provided. :type figsize: tuple, optional :param ax: The axis to draw to. If not provided, a new figure will be created. :type ax: matplotlib.Axis, optional :returns: * **fig** (*matplotlib.Figure*) -- The figure containing the plot, or ``None`` if ``ax`` was provided. * **ax** (*matplotlib.Axis*) -- The axis containing the plot. .. py:class:: Benchmarker(kernels, setup=None, names=None, benchmark_opts=None, data_name=None) Compare the performance of various ``kernels``. Internally this makes use of :func:`~xyzpy.benchmark`, :func:`~xyzpy.Harvester` and xyzpys plotting functionality. :param kernels: The functions to compare performance with. :type kernels: sequence of callable :param setup: If given, setup each benchmark run by suppling the size argument ``n`` to this function first, then feeding its output to each of the functions. :type setup: callable, optional :param names: Alternate names to give the function, else they will be inferred. :type names: sequence of str, optional :param benchmark_opts: Supplied to :func:`~xyzpy.benchmark`. :type benchmark_opts: dict, optional :param data_name: If given, the file name the internal harvester will use to store results persistently. :type data_name: str, optional .. attribute:: harvester The harvester that runs and accumulates all the data. :type: xyz.Harvester .. attribute:: ds Shortcut to the harvester's full dataset. :type: xarray.Dataset .. py:attribute:: kernels .. py:attribute:: names .. py:attribute:: setup :value: None .. py:attribute:: benchmark_opts .. py:attribute:: runner .. py:attribute:: harvester .. py:method:: run(ns, kernels=None, **harvest_opts) Run the benchmarks. Each run accumulates rather than overwriting the results. :param ns: The sizes to run the benchmarks with. :type ns: sequence of int or int :param kernels: If given, only run the kernels with these names. :type kernels: sequence of str, optional :param harvest_opts: Supplied to :meth:`~xyzpy.Harvester.harvest_combos`. .. py:property:: ds .. py:method:: plot(**plot_opts) Plot the benchmarking results. .. py:method:: lineplot(**plot_opts) Plot the benchmarking results. .. py:method:: ilineplot(**plot_opts) Interactively plot the benchmarking results. .. py:class:: MemoryMonitor(interval: float = 0.1) Monitor this process' peak memory usage with specified sampling interval in a daemon thread. This is intended to be used as a context manager for long running and memory intensive processes, not fine grained memory tracking. :param interval: Time between memory measurements in seconds. Fluctuations in peak memory between measurements might not be captured. :type interval: float, optional .. attribute:: interval Time between memory measurements in seconds. :type: float .. attribute:: peak The peak memory usage in gigabytes. :type: float .. py:attribute:: interval :value: 0.1 .. py:attribute:: peak :value: None .. py:attribute:: is_running :value: False .. py:attribute:: monitor_thread :value: None .. py:method:: _monitor() .. py:method:: start() Start the memory monitoring thread. .. py:method:: stop() Stop the memory monitoring thread. .. py:method:: __enter__() .. py:method:: __exit__(exc_type, exc_value, traceback) .. py:method:: __del__() .. py:method:: __repr__() .. py:class:: RunningCovariance Running covariance class. .. py:attribute:: count :value: 0 .. py:attribute:: xmean :value: 0.0 .. py:attribute:: ymean :value: 0.0 .. py:attribute:: C :value: 0.0 .. py:method:: update(x, y) .. py:method:: update_from_it(xs, ys) .. py:property:: covar The covariance. .. py:property:: sample_covar The covariance with "Bessel's correction". .. py:class:: RunningCovarianceMatrix(n=2) Running covariance matrix for ``n`` variables. :param n: Number of variables to track. :type n: int, optional .. py:attribute:: n :value: 2 .. py:attribute:: rcs .. py:method:: update(*x) Update the covariance matrix with a single observation. .. py:method:: update_from_it(*xs) Update from iterables of observations for each variable. .. py:property:: count Return the number of samples accumulated. .. py:property:: covar_matrix Return the population covariance matrix. .. py:property:: sample_covar_matrix Return the sample covariance matrix. .. py:method:: to_uncertainties(bias=True) Convert the accumulated statistics to correlated uncertainties, from which new quantities can be calculated with error automatically propagated. :param bias: If False, use the sample covariance with "Bessel's correction". :type bias: bool, optional :returns: **values** -- The sequence of correlated variables. :rtype: tuple of uncertainties.ufloat .. rubric:: Examples Estimate quantities of two perfectly correlated sequences. >>> rcm = xyz.RunningCovarianceMatrix() >>> rcm.update_from_it((1, 3, 2), (2, 6, 4)) >>> x, y = rcm.to_uncertainties(rcm) Calculated quantities like sums have the error propagated: >>> x + y 6.0+/-2.4494897427831783 But the covariance is also taken into account, meaning the ratio here can be estimated with zero error: >>> x / y 0.5+/-0 .. py:class:: RunningStatistics Running mean & standard deviation using Welford's algorithm. This is a very efficient way of keeping track of the error on the mean for example. .. attribute:: mean Current mean. :type: float .. attribute:: count Current count. :type: int .. attribute:: std Current standard deviation. :type: float .. attribute:: var Current variance. :type: float .. attribute:: err Current error on the mean. :type: float .. attribute:: rel_err The current relative error. :type: float .. rubric:: Examples >>> rs = RunningStatistics() >>> rs.update(1.1) >>> rs.update(1.4) >>> rs.update(1.2) >>> rs.update_from_it([1.5, 1.3, 1.6]) >>> rs.mean 1.3499999046325684 >>> rs.std # standard deviation 0.17078252585383266 >>> rs.err # error on the mean 0.06972167422092768 .. py:attribute:: count :value: 0 .. py:attribute:: mean :value: 0.0 .. py:attribute:: M2 :value: 0.0 .. py:method:: update(x) Add a single value ``x`` to the statistics. .. py:method:: update_from_it(xs) Add all values from iterable ``xs`` to the statistics. .. py:method:: converged(rtol, atol) Check if the stats have converged with respect to relative and absolute tolerance ``rtol`` and ``atol``. .. py:property:: var .. py:property:: std .. py:property:: err .. py:property:: rel_err .. py:method:: __repr__() .. py:class:: Timer A very simple context manager class for timing blocks. .. rubric:: Examples >>> from xyzpy import Timer >>> with Timer() as timer: ... print('Doing some work!') ... Doing some work! >>> timer.t 0.00010752677917480469 .. py:method:: __enter__() .. py:method:: __exit__(*args) .. py:function:: benchmark(fn, setup=None, n=None, min_t=0.2, repeats=2, get='min', starmap=False) Benchmark the time it takes to run ``fn``. :param fn: The function to time. :type fn: callable :param setup: If supplied the function that sets up the argument for ``fn``. :type setup: callable, optional :param n: If supplied, the integer to supply to ``setup`` of ``fn``. :type n: int, optional :param min_t: Aim to repeat function enough times to take up this many seconds. :type min_t: float, optional :param repeats: Repeat the whole procedure (with setup) this many times in order to take the minimum run time. :type repeats: int, optional :param get: Return the minimum or mean time for each run. :type get: {'min', 'mean'}, optional :param starmap: Unpack the arguments from ``setup``, if given. :type starmap: bool, optional :returns: **t** -- The minimum, averaged, time to run ``fn`` in seconds. :rtype: float .. rubric:: Examples Just a parameter-less function: >>> import xyzpy as xyz >>> import numpy as np >>> xyz.benchmark(lambda: np.linalg.eig(np.random.randn(100, 100))) 0.004726233000837965 The same but with a setup and size parameter ``n`` specified: >>> setup = lambda n: np.random.randn(n, n) >>> fn = lambda X: np.linalg.eig(X) >>> xyz.benchmark(fn, setup, 100) 0.0042192734545096755 .. py:function:: estimate_from_repeats(fn, *fn_args, rtol=0.02, tol_scale=1.0, get='stats', verbosity=0, min_samples=5, max_samples=1000000, **fn_kwargs) :param fn: The function that estimates a single value. :type fn: callable :param fn_args: Supplied to ``fn``. :param optional: Supplied to ``fn``. :param rtol: Relative tolerance for error on mean. :type rtol: float, optional :param tol_scale: The expected 'scale' of the estimate, this modifies the aboslute tolerance near zero to ``rtol * tol_scale``, default: 1.0. :type tol_scale: float, optional :param get: Just get the ``RunningStatistics`` object, or the actual samples too, or just the actual mean estimate. :type get: {'stats', 'samples', 'mean'}, optional :param verbosity: How much information to show: - ``0``: nothing - ``1``: progress bar just with iteration rate, - ``2``: progress bar with running stats displayed. :type verbosity: { 0, 1, 2}, optional :param min_samples: Take at least this many samples before checking for convergence. :type min_samples: int, optional :param max_samples: Take at maximum this many samples. :type max_samples: int, optional :param fn_kwargs: Supplied to ``fn``. :param optional: Supplied to ``fn``. :returns: * **rs** (*RunningStatistics*) -- Statistics about the random estimation. * **samples** (*list[float]*) -- If ``get=='samples'``, the actual samples. .. rubric:: Examples Estimate the sum of ``n`` random numbers: >>> import numpy as np >>> import xyzpy as xyz >>> def fn(n): ... return np.random.rand(n).sum() ... >>> stats = xyz.estimate_from_repeats(fn, n=10, verbosity=3) 59: 5.13(12): : 58it [00:00, 3610.84it/s] RunningStatistics(mean=5.13(12), count=59) .. py:function:: format_number_with_error(x, err) Given ``x`` with error ``err``, format a string showing the relevant digits of ``x`` with two significant digits of the error bracketed, and overall exponent if necessary. :param x: The value to print. :type x: float :param err: The error on ``x``. :type err: float :rtype: str .. rubric:: Examples >>> print_number_with_uncertainty(0.1542412, 0.0626653) '0.154(63)' >>> print_number_with_uncertainty(-128124123097, 6424) '-1.281241231(64)e+11' .. py:function:: get_peak_memory_usage() Get the peak memory usage of the current process in *gigabytes*. This uses the `psutil` package on Windows, and the `resource` package on Linux and macOS. .. py:function:: getsizeof(obj) Compute the real size of a Python object in bytes, taken from https://stackoverflow.com/a/30316760/5640201. :param obj: Object to measure. :type obj: object :returns: Total size in bytes. :rtype: int .. py:function:: progbar(it=None, nb=False, **kwargs) Turn any iterable into a progress bar, with notebook option :param it: Iterable to wrap with progress bar :type it: iterable :param nb: Whether to display the notebook progress bar :type nb: bool :param \*\*kwargs: additional options to send to tqdm :type \*\*kwargs: dict-like .. py:function:: report_memory() Return a formatted memory usage summary for the current process. .. py:function:: report_memory_gpu() Return a formatted GPU memory usage summary for the process. .. py:function:: unzip(its, zip_level=1) Split a nested iterable at a specified level, i.e. in numpy language transpose the specified 'axis' to be the first. :param its: 'n-dimensional' iterable to split :type its: iterable (of iterables (of iterables ...)) :param zip_level: level at which to split the iterable, default of 1 replicates ``zip(*its)`` behaviour. :type zip_level: int .. rubric:: Example >>> x = [[(1, True), (2, False), (3, True)], [(7, True), (8, False), (9, True)]] >>> nums, bools = unzip(x, 2) >>> nums ((1, 2, 3), (7, 8, 9)) >>> bools ((True, False, True), (True, False, True))