4. Utilities - Benchmarking and Estimation#

xyzpy provides a number of utilities that might be generally useful when generating data. These are:

For timing and comparing functions. And then:

for collecting running statistics and estimating quantities from repeats.

[1]:
%config InlineBackend.figure_formats = ['svg']

import xyzpy as xyz
import numpy as np

4.1. Timing#

4.1.1. Simple timing with Timer#

This is a super simple context manager for very roughly timing a statement that runs once:

[2]:
with xyz.Timer() as timer:

    A = np.random.randn(512, 512)
    el, ev = np.linalg.eig(A)

timer.interval
[2]:
0.19067168235778809

If you run this a few times you might notice some big fluctuations.

4.1.2. Advanced timing with benchmark#

This is a more advanced and accurate function that wraps timeit under the hood. If offers however a convenient interface that accepts callables and sensibly manages how many repeats to do etc.:

[3]:
def setup(n=512):
    return np.random.randn(n, n)

def foo(A):
    return np.linalg.eig(A)

xyz.benchmark(foo, setup=setup)
[3]:
0.17729254602454603

Or we can specfic the size n to benchmark with as well:

[4]:
xyz.benchmark(foo, setup=setup, n=1024)
[4]:
0.5849384269677103

Which is calling foo(setup(n)) under the hood. Generally the setup and n arguments are optional - including them or not allows switching between the following underlying patterns:

foo()
foo(n)
foo(setup())
foo(setup(n))

Supply starmap=True if you want foo(*setup(n)), and see benchmark() for other options, e.g. the minimum time and number of repeats to aim for.

4.1.3. Comparing performance with Benchmarker#

Building on top of benchmark() and combining it with the functionality of a Harvester() gives us a very nice way to compare the performance of various functions, or ‘kernels’.

As an example here we’ll compare python, numpy and numba for computing sum(x**2)**0.5.

[5]:
import numba as nb

def setup(n):
    return np.random.randn(n)

def python_square_sum(xs):
    y = 0.0
    for x in xs:
        y += x**2
    return y**0.5

def numpy_square_sum(xs):
    return (xs**2).sum()**0.5

@nb.njit
def numba_square_sum(xs):
    y = 0.0
    for x in xs:
        y += x**2
    return y**0.5

The setup function will be supplied to each, we can check they first give the same answer:

[6]:
xs = setup(100)
[7]:
python_square_sum(xs)
[7]:
10.697719565000014
[8]:
numpy_square_sum(xs)
[8]:
10.697719565000014
[9]:
numba_square_sum(xs)
[9]:
10.697719565000014

Then we can set up a Benchmarker object to compare these with:

[10]:
kernels = [
    python_square_sum,
    numpy_square_sum,
    numba_square_sum,
]

benchmarker = xyz.Benchmarker(
    kernels,
    setup=setup,
    benchmark_opts={'min_t': 0.01}
)

Next we run a set of problem sizes:

[11]:
sizes = [2**i for i in range(1, 11)]

benchmarker.run(sizes, verbosity=2)
{'n': 1024, 'kernel': 'numba_square_sum'}: 100%|##########| 30/30 [00:01<00:00, 18.54it/s]

Which we can then automatically plot:

[12]:
benchmarker.ilineplot()
Loading BokehJS ...