Utilities - Benchmarking and Estimation
Contents
4. Utilities - Benchmarking and Estimation#
xyzpy
provides a number of utilities that might be generally
useful when generating data. These are:
For timing and comparing functions. And then:
for collecting running statistics and estimating quantities from repeats.
[1]:
%config InlineBackend.figure_formats = ['svg']
import xyzpy as xyz
import numpy as np
4.1. Timing#
4.1.1. Simple timing with Timer
#
This is a super simple context manager for very roughly timing a statement that runs once:
[2]:
with xyz.Timer() as timer:
A = np.random.randn(512, 512)
el, ev = np.linalg.eig(A)
timer.interval
[2]:
0.19067168235778809
If you run this a few times you might notice some big fluctuations.
4.1.2. Advanced timing with benchmark
#
This is a more advanced and accurate function that wraps timeit
under the hood. If offers however a convenient interface that accepts callables and sensibly manages how many repeats to do etc.:
[3]:
def setup(n=512):
return np.random.randn(n, n)
def foo(A):
return np.linalg.eig(A)
xyz.benchmark(foo, setup=setup)
[3]:
0.17729254602454603
Or we can specfic the size n
to benchmark with as well:
[4]:
xyz.benchmark(foo, setup=setup, n=1024)
[4]:
0.5849384269677103
Which is calling foo(setup(n))
under the hood.
Generally the setup
and n
arguments are optional -
including them or not allows switching between the following
underlying patterns:
foo()
foo(n)
foo(setup())
foo(setup(n))
Supply starmap=True
if you want foo(*setup(n))
, and
see benchmark()
for other options, e.g. the
minimum time and number of repeats to aim for.
4.1.3. Comparing performance with Benchmarker
#
Building on top of benchmark()
and combining it with
the functionality of a Harvester()
gives us a very nice
way to compare the performance of various functions, or ‘kernels’.
As an example here we’ll compare python
, numpy
and numba
for computing sum(x**2)**0.5
.
[5]:
import numba as nb
def setup(n):
return np.random.randn(n)
def python_square_sum(xs):
y = 0.0
for x in xs:
y += x**2
return y**0.5
def numpy_square_sum(xs):
return (xs**2).sum()**0.5
@nb.njit
def numba_square_sum(xs):
y = 0.0
for x in xs:
y += x**2
return y**0.5
The setup
function will be supplied to each, we can check they first give the same answer:
[6]:
xs = setup(100)
[7]:
python_square_sum(xs)
[7]:
10.697719565000014
[8]:
numpy_square_sum(xs)
[8]:
10.697719565000014
[9]:
numba_square_sum(xs)
[9]:
10.697719565000014
Then we can set up a Benchmarker
object to compare these with:
[10]:
kernels = [
python_square_sum,
numpy_square_sum,
numba_square_sum,
]
benchmarker = xyz.Benchmarker(
kernels,
setup=setup,
benchmark_opts={'min_t': 0.01}
)
Next we run a set of problem sizes:
[11]:
sizes = [2**i for i in range(1, 11)]
benchmarker.run(sizes, verbosity=2)
{'n': 1024, 'kernel': 'numba_square_sum'}: 100%|##########| 30/30 [00:01<00:00, 18.54it/s]
Which we can then automatically plot:
[12]:
benchmarker.ilineplot()