xyzpy.utils¶

Utility functions.

Exceptions¶

XYZError

Common base class for all non-exit exceptions.

Classes¶

`Timer`	A very simple context manager class for timing blocks.
`Benchmarker`	Compare the performance of various `kernels`. Internally this makes
`RunningStatistics`	Running mean & standard deviation using Welford's
`RunningCovariance`	Running covariance class.
`RunningCovarianceMatrix`	Running covariance matrix for `n` variables.
`MemoryMonitor`	Monitor this process' peak memory usage with specified sampling interval

Functions¶

`isiterable`(obj)
`prod`(it)	Product of an iterable.
`unzip`(its[, zip_level])	Split a nested iterable at a specified level, i.e. in numpy language
`flatten`(its, n)	Take the n-dimensional nested iterable its and flatten it.
`_get_fn_name`(fn)	Try to inspect a function's name, taking into account several common
`progbar`([it, nb])	Turn any iterable into a progress bar, with notebook option
`getsizeof`(obj)	Compute the real size of a Python object in bytes, taken from
`_auto_min_time`(timer[, min_t, repeats, get])
`benchmark`(fn[, setup, n, min_t, repeats, get, starmap])	Benchmark the time it takes to run `fn`.
`format_number_with_error`(x, err)	Given `x` with error `err`, format a string showing the relevant
`estimate_from_repeats`(fn, *fn_args[, rtol, tol_scale, ...])
`get_peak_memory_usage`()	Get the peak memory usage of the current process in gigabytes. This
`report_memory`()	Return a formatted memory usage summary for the current process.
`report_memory_gpu`()	Return a formatted GPU memory usage summary for the process.
`autocorrect_kwargs`([func, valid_kwargs])	A decorator that suggests the right keyword arguments if you get them

Module Contents¶

exception xyzpy.utils.XYZError[source]¶

Bases: Exception

Common base class for all non-exit exceptions.

xyzpy.utils.isiterable(obj)[source]¶

xyzpy.utils.prod(it)[source]¶: Product of an iterable.

xyzpy.utils.unzip(its, zip_level=1)[source]¶

Split a nested iterable at a specified level, i.e. in numpy language transpose the specified ‘axis’ to be the first.

Parameters:

its (iterable (of iterables (of iterables ...))) – ‘n-dimensional’ iterable to split
zip_level (int) – level at which to split the iterable, default of 1 replicates zip(*its) behaviour.

Example

>>> x = [[(1, True), (2, False), (3, True)],
         [(7, True), (8, False), (9, True)]]
>>> nums, bools = unzip(x, 2)
>>> nums
((1, 2, 3), (7, 8, 9))
>>> bools
((True, False, True), (True, False, True))

xyzpy.utils.flatten(its, n)[source]¶

Take the n-dimensional nested iterable its and flatten it.

Parameters:

its (nested iterable)
n (number of dimensions)

Return type:

flattened iterable of all items

xyzpy.utils._get_fn_name(fn)[source]¶: Try to inspect a function’s name, taking into account several common non-standard types of function: dask, functools.partial …

xyzpy.utils.progbar(it=None, nb=False, **kwargs)[source]¶

Turn any iterable into a progress bar, with notebook option

Parameters:

it (iterable) – Iterable to wrap with progress bar
nb (bool) – Whether to display the notebook progress bar
**kwargs (dict-like) – additional options to send to tqdm

xyzpy.utils.getsizeof(obj)[source]¶

Compute the real size of a Python object in bytes, taken from https://stackoverflow.com/a/30316760/5640201.

Parameters:: obj (object) – Object to measure.
Returns:: Total size in bytes.
Return type:: int

class xyzpy.utils.Timer[source]¶

A very simple context manager class for timing blocks.

Examples

>>> from xyzpy import Timer
>>> with Timer() as timer:
...     print('Doing some work!')
...
Doing some work!
>>> timer.t
0.00010752677917480469

__enter__()[source]¶

__exit__(*args)[source]¶

xyzpy.utils._auto_min_time(timer, min_t=0.2, repeats=2, get='min')[source]¶

xyzpy.utils.benchmark(fn, setup=None, n=None, min_t=0.2, repeats=2, get='min', starmap=False)[source]¶

Benchmark the time it takes to run fn.

Parameters:

fn (callable) – The function to time.
setup (callable, optional) – If supplied the function that sets up the argument for fn.
n (int, optional) – If supplied, the integer to supply to setup of fn.
min_t (float, optional) – Aim to repeat function enough times to take up this many seconds.
repeats (int, optional) – Repeat the whole procedure (with setup) this many times in order to take the minimum run time.
get ({'min', 'mean'}, optional) – Return the minimum or mean time for each run.
starmap (bool, optional) – Unpack the arguments from setup, if given.

Returns:

t – The minimum, averaged, time to run fn in seconds.

Return type:

float

Examples

Just a parameter-less function:

>>> import xyzpy as xyz
>>> import numpy as np
>>> xyz.benchmark(lambda: np.linalg.eig(np.random.randn(100, 100)))
0.004726233000837965

The same but with a setup and size parameter n specified:

>>> setup = lambda n: np.random.randn(n, n)
>>> fn = lambda X: np.linalg.eig(X)
>>> xyz.benchmark(fn, setup, 100)
0.0042192734545096755

class xyzpy.utils.Benchmarker(kernels, setup=None, names=None, benchmark_opts=None, data_name=None)[source]¶

Compare the performance of various kernels. Internally this makes use of benchmark(), Harvester() and xyzpys plotting functionality.

Parameters:

kernels (sequence of callable) – The functions to compare performance with.
setup (callable, optional) – If given, setup each benchmark run by suppling the size argument n to this function first, then feeding its output to each of the functions.
names (sequence of str, optional) – Alternate names to give the function, else they will be inferred.
benchmark_opts (dict, optional) – Supplied to benchmark().
data_name (str, optional) – If given, the file name the internal harvester will use to store results persistently.

harvester¶

The harvester that runs and accumulates all the data.

Type:: xyz.Harvester

ds¶

Shortcut to the harvester’s full dataset.

Type:: xarray.Dataset

kernels¶

names¶

setup = None¶

benchmark_opts¶

runner¶

harvester¶

run(ns, kernels=None, **harvest_opts)[source]¶

Run the benchmarks. Each run accumulates rather than overwriting the results.

Parameters:

ns (sequence of int or int) – The sizes to run the benchmarks with.
kernels (sequence of str, optional) – If given, only run the kernels with these names.
harvest_opts – Supplied to harvest_combos().

property ds¶

plot(**plot_opts)[source]¶: Plot the benchmarking results.

lineplot(**plot_opts)[source]¶: Plot the benchmarking results.

ilineplot(**plot_opts)[source]¶: Interactively plot the benchmarking results.

xyzpy.utils.format_number_with_error(x, err)[source]¶

Given x with error err, format a string showing the relevant digits of x with two significant digits of the error bracketed, and overall exponent if necessary.

Parameters:

x (float) – The value to print.
err (float) – The error on x.

Return type:

str

Examples

>>> print_number_with_uncertainty(0.1542412, 0.0626653)
'0.154(63)'

>>> print_number_with_uncertainty(-128124123097, 6424)
'-1.281241231(64)e+11'

class xyzpy.utils.RunningStatistics[source]¶

Running mean & standard deviation using Welford’s algorithm. This is a very efficient way of keeping track of the error on the mean for example.

mean¶

Current mean.

Type:: float

count¶

Current count.

Type:: int

std¶

Current standard deviation.

Type:: float

var¶

Current variance.

Type:: float

err¶

Current error on the mean.

Type:: float

rel_err¶

The current relative error.

Type:: float

Examples

>>> rs = RunningStatistics()
>>> rs.update(1.1)
>>> rs.update(1.4)
>>> rs.update(1.2)
>>> rs.update_from_it([1.5, 1.3, 1.6])
>>> rs.mean
1.3499999046325684

>>> rs.std  # standard deviation
0.17078252585383266

>>> rs.err  # error on the mean
0.06972167422092768

count = 0¶

mean = 0.0¶

M2 = 0.0¶

update(x)[source]¶: Add a single value x to the statistics.

update_from_it(xs)[source]¶: Add all values from iterable xs to the statistics.

converged(rtol, atol)[source]¶: Check if the stats have converged with respect to relative and absolute tolerance rtol and atol.

property var¶

property std¶

property err¶

property rel_err¶

__repr__()[source]¶

class xyzpy.utils.RunningCovariance[source]¶

Running covariance class.

count = 0¶

xmean = 0.0¶

ymean = 0.0¶

C = 0.0¶

update(x, y)[source]¶

update_from_it(xs, ys)[source]¶

property covar¶: The covariance.

property sample_covar¶: The covariance with “Bessel’s correction”.

class xyzpy.utils.RunningCovarianceMatrix(n=2)[source]¶

Running covariance matrix for n variables.

Parameters:: n (int, optional) – Number of variables to track.

n = 2¶

rcs¶

update(*x)[source]¶: Update the covariance matrix with a single observation.

update_from_it(*xs)[source]¶: Update from iterables of observations for each variable.

property count¶: Return the number of samples accumulated.

property covar_matrix¶: Return the population covariance matrix.

property sample_covar_matrix¶: Return the sample covariance matrix.

to_uncertainties(bias=True)[source]¶

Convert the accumulated statistics to correlated uncertainties, from which new quantities can be calculated with error automatically propagated.

Parameters:: bias (bool, optional) – If False, use the sample covariance with “Bessel’s correction”.
Returns:: values – The sequence of correlated variables.
Return type:: tuple of uncertainties.ufloat

Examples

Estimate quantities of two perfectly correlated sequences.

>>> rcm = xyz.RunningCovarianceMatrix()
>>> rcm.update_from_it((1, 3, 2), (2, 6, 4))
>>> x, y = rcm.to_uncertainties(rcm)

Calculated quantities like sums have the error propagated:

>>> x + y
6.0+/-2.4494897427831783

But the covariance is also taken into account, meaning the ratio here can be estimated with zero error:

>>> x / y
0.5+/-0

xyzpy.utils.estimate_from_repeats(fn, *fn_args, rtol=0.02, tol_scale=1.0, get='stats', verbosity=0, min_samples=5, max_samples=1000000, **fn_kwargs)[source]¶

Parameters:

fn (callable) – The function that estimates a single value.
fn_args – Supplied to fn.
optional – Supplied to fn.
rtol (float, optional) – Relative tolerance for error on mean.
tol_scale (float, optional) – The expected ‘scale’ of the estimate, this modifies the aboslute tolerance near zero to rtol * tol_scale, default: 1.0.
get ({'stats', 'samples', 'mean'}, optional) – Just get the RunningStatistics object, or the actual samples too, or just the actual mean estimate.
verbosity ({ 0, 1, 2}, optional) –
How much information to show:
- 0: nothing
- 1: progress bar just with iteration rate,
- 2: progress bar with running stats displayed.
min_samples (int, optional) – Take at least this many samples before checking for convergence.
max_samples (int, optional) – Take at maximum this many samples.
fn_kwargs – Supplied to fn.
optional – Supplied to fn.

Returns:

rs (RunningStatistics) – Statistics about the random estimation.
samples (list[float]) – If get=='samples', the actual samples.

Examples

Estimate the sum of n random numbers:

>>> import numpy as np
>>> import xyzpy as xyz
>>> def fn(n):
...     return np.random.rand(n).sum()
...
>>> stats = xyz.estimate_from_repeats(fn, n=10, verbosity=3)
59: 5.13(12): : 58it [00:00, 3610.84it/s]
RunningStatistics(mean=5.13(12), count=59)

class xyzpy.utils.MemoryMonitor(interval: float = 0.1)[source]¶

Monitor this process’ peak memory usage with specified sampling interval in a daemon thread. This is intended to be used as a context manager for long running and memory intensive processes, not fine grained memory tracking.

Parameters:: interval (float, optional) – Time between memory measurements in seconds. Fluctuations in peak memory between measurements might not be captured.

interval¶

Time between memory measurements in seconds.

Type:: float

peak¶

The peak memory usage in gigabytes.

Type:: float

interval = 0.1¶

peak = None¶

is_running = False¶

monitor_thread = None¶

_monitor()[source]¶

start()[source]¶: Start the memory monitoring thread.

stop()[source]¶: Stop the memory monitoring thread.

__enter__()[source]¶

__exit__(exc_type, exc_value, traceback)[source]¶

__del__()[source]¶

__repr__()[source]¶

xyzpy.utils.get_peak_memory_usage()[source]¶: Get the peak memory usage of the current process in gigabytes. This uses the psutil package on Windows, and the resource package on Linux and macOS.

xyzpy.utils.report_memory()[source]¶: Return a formatted memory usage summary for the current process.

xyzpy.utils.report_memory_gpu()[source]¶: Return a formatted GPU memory usage summary for the process.

xyzpy.utils.autocorrect_kwargs(func=None, valid_kwargs=None)[source]¶

A decorator that suggests the right keyword arguments if you get them wrong. Useful for functions with many specific options.

Parameters:

func (callable, optional) – The function to decorate.
valid_kwargs (sequence[str], optional) – The valid keyword arguments for func, if not given these are inferred from the function signature.