xyzpy.utils =========== .. py:module:: xyzpy.utils .. autoapi-nested-parse:: Utility functions. Exceptions ---------- .. autoapisummary:: xyzpy.utils.XYZError Classes ------- .. autoapisummary:: xyzpy.utils.Timer xyzpy.utils.Benchmarker xyzpy.utils.RunningStatistics xyzpy.utils.RunningCovariance xyzpy.utils.RunningCovarianceMatrix xyzpy.utils.MemoryMonitor Functions --------- .. autoapisummary:: xyzpy.utils.isiterable xyzpy.utils.prod xyzpy.utils.unzip xyzpy.utils.flatten xyzpy.utils._get_fn_name xyzpy.utils.progbar xyzpy.utils.getsizeof xyzpy.utils._auto_min_time xyzpy.utils.benchmark xyzpy.utils.format_number_with_error xyzpy.utils.estimate_from_repeats xyzpy.utils.get_peak_memory_usage xyzpy.utils.report_memory xyzpy.utils.report_memory_gpu xyzpy.utils.autocorrect_kwargs Module Contents --------------- .. py:exception:: XYZError Bases: :py:obj:`Exception` Common base class for all non-exit exceptions. .. py:function:: isiterable(obj) .. py:function:: prod(it) Product of an iterable. .. py:function:: unzip(its, zip_level=1) Split a nested iterable at a specified level, i.e. in numpy language transpose the specified 'axis' to be the first. :param its: 'n-dimensional' iterable to split :type its: iterable (of iterables (of iterables ...)) :param zip_level: level at which to split the iterable, default of 1 replicates ``zip(*its)`` behaviour. :type zip_level: int .. rubric:: Example >>> x = [[(1, True), (2, False), (3, True)], [(7, True), (8, False), (9, True)]] >>> nums, bools = unzip(x, 2) >>> nums ((1, 2, 3), (7, 8, 9)) >>> bools ((True, False, True), (True, False, True)) .. py:function:: flatten(its, n) Take the n-dimensional nested iterable its and flatten it. :param its: :type its: nested iterable :param n: :type n: number of dimensions :rtype: flattened iterable of all items .. py:function:: _get_fn_name(fn) Try to inspect a function's name, taking into account several common non-standard types of function: dask, functools.partial ... .. py:function:: progbar(it=None, nb=False, **kwargs) Turn any iterable into a progress bar, with notebook option :param it: Iterable to wrap with progress bar :type it: iterable :param nb: Whether to display the notebook progress bar :type nb: bool :param \*\*kwargs: additional options to send to tqdm :type \*\*kwargs: dict-like .. py:function:: getsizeof(obj) Compute the real size of a Python object in bytes, taken from https://stackoverflow.com/a/30316760/5640201. :param obj: Object to measure. :type obj: object :returns: Total size in bytes. :rtype: int .. py:class:: Timer A very simple context manager class for timing blocks. .. rubric:: Examples >>> from xyzpy import Timer >>> with Timer() as timer: ... print('Doing some work!') ... Doing some work! >>> timer.t 0.00010752677917480469 .. py:method:: __enter__() .. py:method:: __exit__(*args) .. py:function:: _auto_min_time(timer, min_t=0.2, repeats=2, get='min') .. py:function:: benchmark(fn, setup=None, n=None, min_t=0.2, repeats=2, get='min', starmap=False) Benchmark the time it takes to run ``fn``. :param fn: The function to time. :type fn: callable :param setup: If supplied the function that sets up the argument for ``fn``. :type setup: callable, optional :param n: If supplied, the integer to supply to ``setup`` of ``fn``. :type n: int, optional :param min_t: Aim to repeat function enough times to take up this many seconds. :type min_t: float, optional :param repeats: Repeat the whole procedure (with setup) this many times in order to take the minimum run time. :type repeats: int, optional :param get: Return the minimum or mean time for each run. :type get: {'min', 'mean'}, optional :param starmap: Unpack the arguments from ``setup``, if given. :type starmap: bool, optional :returns: **t** -- The minimum, averaged, time to run ``fn`` in seconds. :rtype: float .. rubric:: Examples Just a parameter-less function: >>> import xyzpy as xyz >>> import numpy as np >>> xyz.benchmark(lambda: np.linalg.eig(np.random.randn(100, 100))) 0.004726233000837965 The same but with a setup and size parameter ``n`` specified: >>> setup = lambda n: np.random.randn(n, n) >>> fn = lambda X: np.linalg.eig(X) >>> xyz.benchmark(fn, setup, 100) 0.0042192734545096755 .. py:class:: Benchmarker(kernels, setup=None, names=None, benchmark_opts=None, data_name=None) Compare the performance of various ``kernels``. Internally this makes use of :func:`~xyzpy.benchmark`, :func:`~xyzpy.Harvester` and xyzpys plotting functionality. :param kernels: The functions to compare performance with. :type kernels: sequence of callable :param setup: If given, setup each benchmark run by suppling the size argument ``n`` to this function first, then feeding its output to each of the functions. :type setup: callable, optional :param names: Alternate names to give the function, else they will be inferred. :type names: sequence of str, optional :param benchmark_opts: Supplied to :func:`~xyzpy.benchmark`. :type benchmark_opts: dict, optional :param data_name: If given, the file name the internal harvester will use to store results persistently. :type data_name: str, optional .. attribute:: harvester The harvester that runs and accumulates all the data. :type: xyz.Harvester .. attribute:: ds Shortcut to the harvester's full dataset. :type: xarray.Dataset .. py:attribute:: kernels .. py:attribute:: names .. py:attribute:: setup :value: None .. py:attribute:: benchmark_opts .. py:attribute:: runner .. py:attribute:: harvester .. py:method:: run(ns, kernels=None, **harvest_opts) Run the benchmarks. Each run accumulates rather than overwriting the results. :param ns: The sizes to run the benchmarks with. :type ns: sequence of int or int :param kernels: If given, only run the kernels with these names. :type kernels: sequence of str, optional :param harvest_opts: Supplied to :meth:`~xyzpy.Harvester.harvest_combos`. .. py:property:: ds .. py:method:: plot(**plot_opts) Plot the benchmarking results. .. py:method:: lineplot(**plot_opts) Plot the benchmarking results. .. py:method:: ilineplot(**plot_opts) Interactively plot the benchmarking results. .. py:function:: format_number_with_error(x, err) Given ``x`` with error ``err``, format a string showing the relevant digits of ``x`` with two significant digits of the error bracketed, and overall exponent if necessary. :param x: The value to print. :type x: float :param err: The error on ``x``. :type err: float :rtype: str .. rubric:: Examples >>> print_number_with_uncertainty(0.1542412, 0.0626653) '0.154(63)' >>> print_number_with_uncertainty(-128124123097, 6424) '-1.281241231(64)e+11' .. py:class:: RunningStatistics Running mean & standard deviation using Welford's algorithm. This is a very efficient way of keeping track of the error on the mean for example. .. attribute:: mean Current mean. :type: float .. attribute:: count Current count. :type: int .. attribute:: std Current standard deviation. :type: float .. attribute:: var Current variance. :type: float .. attribute:: err Current error on the mean. :type: float .. attribute:: rel_err The current relative error. :type: float .. rubric:: Examples >>> rs = RunningStatistics() >>> rs.update(1.1) >>> rs.update(1.4) >>> rs.update(1.2) >>> rs.update_from_it([1.5, 1.3, 1.6]) >>> rs.mean 1.3499999046325684 >>> rs.std # standard deviation 0.17078252585383266 >>> rs.err # error on the mean 0.06972167422092768 .. py:attribute:: count :value: 0 .. py:attribute:: mean :value: 0.0 .. py:attribute:: M2 :value: 0.0 .. py:method:: update(x) Add a single value ``x`` to the statistics. .. py:method:: update_from_it(xs) Add all values from iterable ``xs`` to the statistics. .. py:method:: converged(rtol, atol) Check if the stats have converged with respect to relative and absolute tolerance ``rtol`` and ``atol``. .. py:property:: var .. py:property:: std .. py:property:: err .. py:property:: rel_err .. py:method:: __repr__() .. py:class:: RunningCovariance Running covariance class. .. py:attribute:: count :value: 0 .. py:attribute:: xmean :value: 0.0 .. py:attribute:: ymean :value: 0.0 .. py:attribute:: C :value: 0.0 .. py:method:: update(x, y) .. py:method:: update_from_it(xs, ys) .. py:property:: covar The covariance. .. py:property:: sample_covar The covariance with "Bessel's correction". .. py:class:: RunningCovarianceMatrix(n=2) Running covariance matrix for ``n`` variables. :param n: Number of variables to track. :type n: int, optional .. py:attribute:: n :value: 2 .. py:attribute:: rcs .. py:method:: update(*x) Update the covariance matrix with a single observation. .. py:method:: update_from_it(*xs) Update from iterables of observations for each variable. .. py:property:: count Return the number of samples accumulated. .. py:property:: covar_matrix Return the population covariance matrix. .. py:property:: sample_covar_matrix Return the sample covariance matrix. .. py:method:: to_uncertainties(bias=True) Convert the accumulated statistics to correlated uncertainties, from which new quantities can be calculated with error automatically propagated. :param bias: If False, use the sample covariance with "Bessel's correction". :type bias: bool, optional :returns: **values** -- The sequence of correlated variables. :rtype: tuple of uncertainties.ufloat .. rubric:: Examples Estimate quantities of two perfectly correlated sequences. >>> rcm = xyz.RunningCovarianceMatrix() >>> rcm.update_from_it((1, 3, 2), (2, 6, 4)) >>> x, y = rcm.to_uncertainties(rcm) Calculated quantities like sums have the error propagated: >>> x + y 6.0+/-2.4494897427831783 But the covariance is also taken into account, meaning the ratio here can be estimated with zero error: >>> x / y 0.5+/-0 .. py:function:: estimate_from_repeats(fn, *fn_args, rtol=0.02, tol_scale=1.0, get='stats', verbosity=0, min_samples=5, max_samples=1000000, **fn_kwargs) :param fn: The function that estimates a single value. :type fn: callable :param fn_args: Supplied to ``fn``. :param optional: Supplied to ``fn``. :param rtol: Relative tolerance for error on mean. :type rtol: float, optional :param tol_scale: The expected 'scale' of the estimate, this modifies the aboslute tolerance near zero to ``rtol * tol_scale``, default: 1.0. :type tol_scale: float, optional :param get: Just get the ``RunningStatistics`` object, or the actual samples too, or just the actual mean estimate. :type get: {'stats', 'samples', 'mean'}, optional :param verbosity: How much information to show: - ``0``: nothing - ``1``: progress bar just with iteration rate, - ``2``: progress bar with running stats displayed. :type verbosity: { 0, 1, 2}, optional :param min_samples: Take at least this many samples before checking for convergence. :type min_samples: int, optional :param max_samples: Take at maximum this many samples. :type max_samples: int, optional :param fn_kwargs: Supplied to ``fn``. :param optional: Supplied to ``fn``. :returns: * **rs** (*RunningStatistics*) -- Statistics about the random estimation. * **samples** (*list[float]*) -- If ``get=='samples'``, the actual samples. .. rubric:: Examples Estimate the sum of ``n`` random numbers: >>> import numpy as np >>> import xyzpy as xyz >>> def fn(n): ... return np.random.rand(n).sum() ... >>> stats = xyz.estimate_from_repeats(fn, n=10, verbosity=3) 59: 5.13(12): : 58it [00:00, 3610.84it/s] RunningStatistics(mean=5.13(12), count=59) .. py:class:: MemoryMonitor(interval: float = 0.1) Monitor this process' peak memory usage with specified sampling interval in a daemon thread. This is intended to be used as a context manager for long running and memory intensive processes, not fine grained memory tracking. :param interval: Time between memory measurements in seconds. Fluctuations in peak memory between measurements might not be captured. :type interval: float, optional .. attribute:: interval Time between memory measurements in seconds. :type: float .. attribute:: peak The peak memory usage in gigabytes. :type: float .. py:attribute:: interval :value: 0.1 .. py:attribute:: peak :value: None .. py:attribute:: is_running :value: False .. py:attribute:: monitor_thread :value: None .. py:method:: _monitor() .. py:method:: start() Start the memory monitoring thread. .. py:method:: stop() Stop the memory monitoring thread. .. py:method:: __enter__() .. py:method:: __exit__(exc_type, exc_value, traceback) .. py:method:: __del__() .. py:method:: __repr__() .. py:function:: get_peak_memory_usage() Get the peak memory usage of the current process in *gigabytes*. This uses the `psutil` package on Windows, and the `resource` package on Linux and macOS. .. py:function:: report_memory() Return a formatted memory usage summary for the current process. .. py:function:: report_memory_gpu() Return a formatted GPU memory usage summary for the process. .. py:function:: autocorrect_kwargs(func=None, valid_kwargs=None) A decorator that suggests the right keyword arguments if you get them wrong. Useful for functions with many specific options. :param func: The function to decorate. :type func: callable, optional :param valid_kwargs: The valid keyword arguments for ``func``, if not given these are inferred from the function signature. :type valid_kwargs: sequence[str], optional