xyzpy.utils
===========

.. py:module:: xyzpy.utils

.. autoapi-nested-parse::

   Utility functions.


Exceptions
----------

.. autoapisummary::

   xyzpy.utils.XYZError


Classes
-------

.. autoapisummary::

   xyzpy.utils.Timer
   xyzpy.utils.Benchmarker
   xyzpy.utils.RunningStatistics
   xyzpy.utils.RunningCovariance
   xyzpy.utils.RunningCovarianceMatrix
   xyzpy.utils.MemoryMonitor


Functions
---------

.. autoapisummary::

   xyzpy.utils.isiterable
   xyzpy.utils.prod
   xyzpy.utils.unzip
   xyzpy.utils.flatten
   xyzpy.utils._get_fn_name
   xyzpy.utils.progbar
   xyzpy.utils.getsizeof
   xyzpy.utils._auto_min_time
   xyzpy.utils.benchmark
   xyzpy.utils.format_number_with_error
   xyzpy.utils.estimate_from_repeats
   xyzpy.utils.get_peak_memory_usage
   xyzpy.utils.report_memory
   xyzpy.utils.report_memory_gpu
   xyzpy.utils.autocorrect_kwargs


Module Contents
---------------

.. py:exception:: XYZError

   Bases: :py:obj:`Exception`


   Common base class for all non-exit exceptions.


.. py:function:: isiterable(obj)

.. py:function:: prod(it)

   Product of an iterable.


.. py:function:: unzip(its, zip_level=1)

   Split a nested iterable at a specified level, i.e. in numpy language
   transpose the specified 'axis' to be the first.

   :param its: 'n-dimensional' iterable to split
   :type its: iterable (of iterables (of iterables ...))
   :param zip_level: level at which to split the iterable, default of 1 replicates
                     ``zip(*its)`` behaviour.
   :type zip_level: int

   .. rubric:: Example

   >>> x = [[(1, True), (2, False), (3, True)],
            [(7, True), (8, False), (9, True)]]
   >>> nums, bools = unzip(x, 2)
   >>> nums
   ((1, 2, 3), (7, 8, 9))
   >>> bools
   ((True, False, True), (True, False, True))


.. py:function:: flatten(its, n)

   Take the n-dimensional nested iterable its and flatten it.

   :param its:
   :type its: nested iterable
   :param n:
   :type n: number of dimensions

   :rtype: flattened iterable of all items


.. py:function:: _get_fn_name(fn)

   Try to inspect a function's name, taking into account several common
   non-standard types of function: dask, functools.partial ...


.. py:function:: progbar(it=None, nb=False, **kwargs)

   Turn any iterable into a progress bar, with notebook option

   :param it: Iterable to wrap with progress bar
   :type it: iterable
   :param nb: Whether  to display the notebook progress bar
   :type nb: bool
   :param \*\*kwargs: additional options to send to tqdm
   :type \*\*kwargs: dict-like


.. py:function:: getsizeof(obj)

   Compute the real size of a Python object in bytes, taken from
   https://stackoverflow.com/a/30316760/5640201.

   :param obj: Object to measure.
   :type obj: object

   :returns: Total size in bytes.
   :rtype: int


.. py:class:: Timer

   A very simple context manager class for timing blocks.

   .. rubric:: Examples

   >>> from xyzpy import Timer
   >>> with Timer() as timer:
   ...     print('Doing some work!')
   ...
   Doing some work!
   >>> timer.t
   0.00010752677917480469


   .. py:method:: __enter__()


   .. py:method:: __exit__(*args)


.. py:function:: _auto_min_time(timer, min_t=0.2, repeats=2, get='min')

.. py:function:: benchmark(fn, setup=None, n=None, min_t=0.2, repeats=2, get='min', starmap=False)

   Benchmark the time it takes to run ``fn``.

   :param fn: The function to time.
   :type fn: callable
   :param setup: If supplied the function that sets up the argument for ``fn``.
   :type setup: callable, optional
   :param n: If supplied, the integer to supply to ``setup`` of ``fn``.
   :type n: int, optional
   :param min_t: Aim to repeat function enough times to take up this many seconds.
   :type min_t: float, optional
   :param repeats: Repeat the whole procedure (with setup) this many times in order to
                   take the minimum run time.
   :type repeats: int, optional
   :param get: Return the minimum or mean time for each run.
   :type get: {'min', 'mean'}, optional
   :param starmap: Unpack the arguments from ``setup``, if given.
   :type starmap: bool, optional

   :returns: **t** -- The minimum, averaged, time to run ``fn`` in seconds.
   :rtype: float

   .. rubric:: Examples

   Just a parameter-less function:

       >>> import xyzpy as xyz
       >>> import numpy as np
       >>> xyz.benchmark(lambda: np.linalg.eig(np.random.randn(100, 100)))
       0.004726233000837965

   The same but with a setup and size parameter ``n`` specified:

       >>> setup = lambda n: np.random.randn(n, n)
       >>> fn = lambda X: np.linalg.eig(X)
       >>> xyz.benchmark(fn, setup, 100)
       0.0042192734545096755


.. py:class:: Benchmarker(kernels, setup=None, names=None, benchmark_opts=None, data_name=None)

   Compare the performance of various ``kernels``. Internally this makes
   use of :func:`~xyzpy.benchmark`, :func:`~xyzpy.Harvester` and xyzpys
   plotting functionality.

   :param kernels: The functions to compare performance with.
   :type kernels: sequence of callable
   :param setup: If given, setup each benchmark run by suppling the size argument ``n``
                 to this function first, then feeding its output to each of the
                 functions.
   :type setup: callable, optional
   :param names: Alternate names to give the function, else they will be inferred.
   :type names: sequence of str, optional
   :param benchmark_opts: Supplied to :func:`~xyzpy.benchmark`.
   :type benchmark_opts: dict, optional
   :param data_name: If given, the file name the internal harvester will use to store
                     results persistently.
   :type data_name: str, optional

   .. attribute:: harvester

      The harvester that runs and accumulates all the data.

      :type: xyz.Harvester

   .. attribute:: ds

      Shortcut to the harvester's full dataset.

      :type: xarray.Dataset


   .. py:attribute:: kernels


   .. py:attribute:: names


   .. py:attribute:: setup
      :value: None


   .. py:attribute:: benchmark_opts


   .. py:attribute:: runner


   .. py:attribute:: harvester


   .. py:method:: run(ns, kernels=None, **harvest_opts)

      Run the benchmarks. Each run accumulates rather than overwriting the
      results.

      :param ns: The sizes to run the benchmarks with.
      :type ns: sequence of int or int
      :param kernels: If given, only run the kernels with these names.
      :type kernels: sequence of str, optional
      :param harvest_opts: Supplied to :meth:`~xyzpy.Harvester.harvest_combos`.


   .. py:property:: ds


   .. py:method:: plot(**plot_opts)

      Plot the benchmarking results.


   .. py:method:: lineplot(**plot_opts)

      Plot the benchmarking results.


   .. py:method:: ilineplot(**plot_opts)

      Interactively plot the benchmarking results.


.. py:function:: format_number_with_error(x, err)

   Given ``x`` with error ``err``, format a string showing the relevant
   digits of ``x`` with two significant digits of the error bracketed, and
   overall exponent if necessary.

   :param x: The value to print.
   :type x: float
   :param err: The error on ``x``.
   :type err: float

   :rtype: str

   .. rubric:: Examples

   >>> print_number_with_uncertainty(0.1542412, 0.0626653)
   '0.154(63)'

   >>> print_number_with_uncertainty(-128124123097, 6424)
   '-1.281241231(64)e+11'


.. py:class:: RunningStatistics

   Running mean & standard deviation using Welford's
   algorithm. This is a very efficient way of keeping track of the error on
   the mean for example.

   .. attribute:: mean

      Current mean.

      :type: float

   .. attribute:: count

      Current count.

      :type: int

   .. attribute:: std

      Current standard deviation.

      :type: float

   .. attribute:: var

      Current variance.

      :type: float

   .. attribute:: err

      Current error on the mean.

      :type: float

   .. attribute:: rel_err

      The current relative error.

      :type: float

   .. rubric:: Examples

   >>> rs = RunningStatistics()
   >>> rs.update(1.1)
   >>> rs.update(1.4)
   >>> rs.update(1.2)
   >>> rs.update_from_it([1.5, 1.3, 1.6])
   >>> rs.mean
   1.3499999046325684

   >>> rs.std  # standard deviation
   0.17078252585383266

   >>> rs.err  # error on the mean
   0.06972167422092768


   .. py:attribute:: count
      :value: 0


   .. py:attribute:: mean
      :value: 0.0


   .. py:attribute:: M2
      :value: 0.0


   .. py:method:: update(x)

      Add a single value ``x`` to the statistics.


   .. py:method:: update_from_it(xs)

      Add all values from iterable ``xs`` to the statistics.


   .. py:method:: converged(rtol, atol)

      Check if the stats have converged with respect to relative and
      absolute tolerance ``rtol`` and ``atol``.


   .. py:property:: var


   .. py:property:: std


   .. py:property:: err


   .. py:property:: rel_err


   .. py:method:: __repr__()


.. py:class:: RunningCovariance

   Running covariance class.


   .. py:attribute:: count
      :value: 0


   .. py:attribute:: xmean
      :value: 0.0


   .. py:attribute:: ymean
      :value: 0.0


   .. py:attribute:: C
      :value: 0.0


   .. py:method:: update(x, y)


   .. py:method:: update_from_it(xs, ys)


   .. py:property:: covar

      The covariance.


   .. py:property:: sample_covar

      The covariance with "Bessel's correction".


.. py:class:: RunningCovarianceMatrix(n=2)

   Running covariance matrix for ``n`` variables.

   :param n: Number of variables to track.
   :type n: int, optional


   .. py:attribute:: n
      :value: 2


   .. py:attribute:: rcs


   .. py:method:: update(*x)

      Update the covariance matrix with a single observation.


   .. py:method:: update_from_it(*xs)

      Update from iterables of observations for each variable.


   .. py:property:: count

      Return the number of samples accumulated.


   .. py:property:: covar_matrix

      Return the population covariance matrix.


   .. py:property:: sample_covar_matrix

      Return the sample covariance matrix.


   .. py:method:: to_uncertainties(bias=True)

      Convert the accumulated statistics to correlated uncertainties,
      from which new quantities can be calculated with error automatically
      propagated.

      :param bias: If False, use the sample covariance with "Bessel's correction".
      :type bias: bool, optional

      :returns: **values** -- The sequence of correlated variables.
      :rtype: tuple of uncertainties.ufloat

      .. rubric:: Examples

      Estimate quantities of two perfectly correlated sequences.

          >>> rcm = xyz.RunningCovarianceMatrix()
          >>> rcm.update_from_it((1, 3, 2), (2, 6, 4))
          >>> x, y = rcm.to_uncertainties(rcm)

      Calculated quantities like sums have the error propagated:

          >>> x + y
          6.0+/-2.4494897427831783

      But the covariance is also taken into account, meaning the ratio here
      can be estimated with zero error:

          >>> x / y
          0.5+/-0


.. py:function:: estimate_from_repeats(fn, *fn_args, rtol=0.02, tol_scale=1.0, get='stats', verbosity=0, min_samples=5, max_samples=1000000, **fn_kwargs)

   :param fn: The function that estimates a single value.
   :type fn: callable
   :param fn_args: Supplied to ``fn``.
   :param optional: Supplied to ``fn``.
   :param rtol: Relative tolerance for error on mean.
   :type rtol: float, optional
   :param tol_scale: The expected 'scale' of the estimate, this modifies the aboslute
                     tolerance near zero to ``rtol * tol_scale``, default: 1.0.
   :type tol_scale: float, optional
   :param get: Just get the ``RunningStatistics`` object, or the actual samples too,
               or just the actual mean estimate.
   :type get: {'stats', 'samples', 'mean'}, optional
   :param verbosity: How much information to show:

                     - ``0``: nothing
                     - ``1``: progress bar just with iteration rate,
                     - ``2``: progress bar with running stats displayed.
   :type verbosity: { 0, 1, 2}, optional
   :param min_samples: Take at least this many samples before checking for convergence.
   :type min_samples: int, optional
   :param max_samples: Take at maximum this many samples.
   :type max_samples: int, optional
   :param fn_kwargs: Supplied to ``fn``.
   :param optional: Supplied to ``fn``.

   :returns: * **rs** (*RunningStatistics*) -- Statistics about the random estimation.
             * **samples** (*list[float]*) -- If ``get=='samples'``, the actual samples.

   .. rubric:: Examples

   Estimate the sum of ``n`` random numbers:

       >>> import numpy as np
       >>> import xyzpy as xyz
       >>> def fn(n):
       ...     return np.random.rand(n).sum()
       ...
       >>> stats = xyz.estimate_from_repeats(fn, n=10, verbosity=3)
       59: 5.13(12): : 58it [00:00, 3610.84it/s]
       RunningStatistics(mean=5.13(12), count=59)


.. py:class:: MemoryMonitor(interval: float = 0.1)

   Monitor this process' peak memory usage with specified sampling interval
   in a daemon thread. This is intended to be used as a context manager for
   long running and memory intensive processes, not fine grained memory
   tracking.

   :param interval: Time between memory measurements in seconds. Fluctuations in peak
                    memory between measurements might not be captured.
   :type interval: float, optional

   .. attribute:: interval

      Time between memory measurements in seconds.

      :type: float

   .. attribute:: peak

      The peak memory usage in gigabytes.

      :type: float


   .. py:attribute:: interval
      :value: 0.1


   .. py:attribute:: peak
      :value: None


   .. py:attribute:: is_running
      :value: False


   .. py:attribute:: monitor_thread
      :value: None


   .. py:method:: _monitor()


   .. py:method:: start()

      Start the memory monitoring thread.


   .. py:method:: stop()

      Stop the memory monitoring thread.


   .. py:method:: __enter__()


   .. py:method:: __exit__(exc_type, exc_value, traceback)


   .. py:method:: __del__()


   .. py:method:: __repr__()


.. py:function:: get_peak_memory_usage()

   Get the peak memory usage of the current process in *gigabytes*. This
   uses the `psutil` package on Windows, and the `resource` package on
   Linux and macOS.


.. py:function:: report_memory()

   Return a formatted memory usage summary for the current process.


.. py:function:: report_memory_gpu()

   Return a formatted GPU memory usage summary for the process.


.. py:function:: autocorrect_kwargs(func=None, valid_kwargs=None)

   A decorator that suggests the right keyword arguments if you get them
   wrong. Useful for functions with many specific options.

   :param func: The function to decorate.
   :type func: callable, optional
   :param valid_kwargs: The valid keyword arguments for ``func``, if not given these are
                        inferred from the function signature.
   :type valid_kwargs: sequence[str], optional