{
"cells": [
{
"cell_type": "markdown",
"id": "76effe6b-09bd-424f-8296-f07f8d08936f",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"(utilities)=\n",
"# Utilities\n",
"\n",
"[`xyzpy`](xyzpy) provides a number of utilities that might be generally\n",
"useful when generating data. These are:\n",
"\n",
"* {class}`~xyzpy.Timer`\n",
"* {func}`~xyzpy.benchmark`\n",
"* {class}`~xyzpy.Benchmarker`\n",
"\n",
"For timing and comparing functions. And then:\n",
"\n",
"* {class}`~xyzpy.RunningStatistics`\n",
"* {func}`~xyzpy.estimate_from_repeats`\n",
"\n",
"for collecting running statistics and estimating quantities from repeats."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "12d83242-844f-45aa-90d0-75fa1aaa604c",
"metadata": {},
"outputs": [],
"source": [
"%config InlineBackend.figure_formats = ['svg']\n",
"\n",
"import numpy as np\n",
"\n",
"import xyzpy as xyz"
]
},
{
"cell_type": "markdown",
"id": "f4b4ac0b-ea68-4e60-a390-95275f1a15ec",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"## Timing\n",
"\n",
"### Simple timing with ``Timer``\n",
"\n",
"This is a super simple context manager for very roughly timing a statement that runs once:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e1d77d3a-3557-4440-a27f-1836d99eed41",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.27247190475463867"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with xyz.Timer() as timer:\n",
" A = np.random.randn(512, 512)\n",
" el, ev = np.linalg.eig(A)\n",
"\n",
"timer.interval"
]
},
{
"cell_type": "markdown",
"id": "e43460c2-1a7c-4b33-aee6-9e215b68b1f9",
"metadata": {},
"source": [
"If you run this a few times you might notice some big fluctuations.\n",
"\n",
"\n",
"### Advanced timing with ``benchmark``\n",
"\n",
"This is a more advanced and accurate function that wraps ``timeit`` under the hood.\n",
"If offers however a convenient interface that accepts callables and sensibly manages\n",
"how many repeats to do etc.:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "83811004-6141-4054-a0a1-ba6777a90605",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.16060145798837766"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def setup(n=512):\n",
" return np.random.randn(n, n)\n",
"\n",
"\n",
"def foo(A):\n",
" return np.linalg.eig(A)\n",
"\n",
"\n",
"xyz.benchmark(foo, setup=setup)"
]
},
{
"cell_type": "markdown",
"id": "cac3f6a8-9b6b-4bfe-b754-9686e4b7cd52",
"metadata": {},
"source": [
"Or we can specfic the size ``n`` to benchmark with as well:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c9767fd8-64f5-4364-b841-c685b4863bf3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.786840959044639"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xyz.benchmark(foo, setup=setup, n=1024)"
]
},
{
"cell_type": "markdown",
"id": "232205ab-6232-4203-823c-14eca6baccdb",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"Which is calling ``foo(setup(n))`` under the hood.\n",
"Generally the ``setup`` and ``n`` arguments are optional -\n",
"including them or not allows switching between the following\n",
"underlying patterns:\n",
"\n",
"```python\n",
"foo()\n",
"foo(n)\n",
"foo(setup())\n",
"foo(setup(n))\n",
"```\n",
"\n",
"Supply ``starmap=True`` if you want ``foo(*setup(n))``, and\n",
"see {func}`~xyzpy.benchmark` for other options, e.g. the\n",
"minimum time and number of repeats to aim for.\n",
"\n",
"\n",
"### Comparing performance with ``Benchmarker``\n",
"\n",
"Building on top of {func}`~xyzpy.benchmark` and combining it with\n",
"the functionality of a {func}`~xyzpy.Harvester` gives us a very nice\n",
"way to compare the performance of various functions, or 'kernels'.\n",
"\n",
"As an example here we'll compare ``python``, ``numpy`` and ``numba``\n",
"for computing ``sum(x**2)**0.5``."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bde8f297-3306-4bda-991f-8171d6743e47",
"metadata": {},
"outputs": [],
"source": [
"import numba as nb\n",
"\n",
"\n",
"def setup(n):\n",
" return np.random.randn(n)\n",
"\n",
"\n",
"def python_square_sum(xs):\n",
" y = 0.0\n",
" for x in xs:\n",
" y += x**2\n",
" return y**0.5\n",
"\n",
"\n",
"def numpy_square_sum(xs):\n",
" return (xs**2).sum() ** 0.5\n",
"\n",
"\n",
"@nb.njit\n",
"def numba_square_sum(xs):\n",
" y = 0.0\n",
" for x in xs:\n",
" y += x**2\n",
" return y**0.5"
]
},
{
"cell_type": "markdown",
"id": "8dc7acd0-6480-425a-a277-9fa0727eb377",
"metadata": {},
"source": [
"The ``setup`` function will be supplied to each, we can check they\n",
"first give the same answer:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d5d93fa6-3502-4f54-871e-707d7476ece4",
"metadata": {},
"outputs": [],
"source": [
"xs = setup(100)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9dc6e848-8b20-4a37-8876-1b9ec049c0c8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(9.97523320851365)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"python_square_sum(xs)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "008ddfb2-b4b7-4d82-b459-8109bfa5edca",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(9.97523320851365)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy_square_sum(xs)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "28f8de98-269c-46bd-9a9c-bfc0c2e3d6b9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"9.97523320851365"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numba_square_sum(xs)"
]
},
{
"cell_type": "markdown",
"id": "9d26ec79-4336-46cc-a083-26d04725a09b",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"Then we can set up a {class}`~xyzpy.utils.Benchmarker` object to compare these with:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4917fbfb-38ed-406c-adb0-e64c3f4512fd",
"metadata": {},
"outputs": [],
"source": [
"kernels = [\n",
" python_square_sum,\n",
" numpy_square_sum,\n",
" numba_square_sum,\n",
"]\n",
"\n",
"benchmarker = xyz.Benchmarker(\n",
" kernels, setup=setup, benchmark_opts={\"min_t\": 0.01}\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f43e0776-54c4-4749-a8bd-a6bc9c2afb13",
"metadata": {},
"source": [
"Next we run a set of problem sizes:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "ab53d456-e0f5-4dc4-9e01-38392398ad16",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|##########| 30/30 [00:01<00:00, 21.85it/s, {'n': 1024, 'kernel': 'numba_square_sum'}] \n"
]
}
],
"source": [
"sizes = [2**i for i in range(1, 11)]\n",
"\n",
"benchmarker.run(sizes, verbosity=2)"
]
},
{
"cell_type": "markdown",
"id": "91789e72-f4e2-41fe-9b3f-76d5a022e4c2",
"metadata": {},
"source": [
"Which we can then automatically plot:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f2f93434-0e7e-4da1-ae8f-3313af9a69ae",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
""
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(,\n",
" array([[]], dtype=object))"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"benchmarker.plot()"
]
},
{
"cell_type": "markdown",
"id": "0f1471cf-a1ab-48a4-9b31-d70c371ba04e",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"Under the hood {class}`~xyzpy.Benchmarker` collects and aggregates results\n",
"using a {class}`~xyzpy.Harvester`. This means that subsequent runs\n",
"of different sizes will be automatically merged. Additionally, if you\n",
"initialize the benchmarker with a ``dataname``, the results will be\n",
"stored in a on-disk dataset.\n",
"\n",
"\n",
"## Estimation\n",
"\n",
"### Efficiently collect running statistics\n",
"\n",
"Sometimes it is convenient to collect statistics on-the-fly, rather than storing\n",
"all the values and computing statistics afterwards. The\n",
"{class}`~xyzpy.RunningStatistics` object can be used for this purpose:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ce238f2a-5a95-4b15-a29a-9c1a980e49bb",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"stats = xyz.RunningStatistics()\n",
"total = 0.0\n",
"\n",
"# don't know how many `x` we'll generate, and won't keep them\n",
"while total < 100:\n",
" x = random.random()\n",
" total += x\n",
"\n",
" stats.update(x)"
]
},
{
"cell_type": "markdown",
"id": "0177f739-7ec6-45dc-b2f9-72096c1053e3",
"metadata": {},
"source": [
"We can now check a variety of information about the values generated:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "ef53a8bd-0106-4830-a00d-25ee63f0ef72",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Count: 207\n",
" Mean: 0.48428927341941447\n",
" Variance: 0.07836627923673178\n",
"Standard Deviation: 0.27993977787504903\n",
" Error on the mean: 0.019457159584961168\n",
" Relative Error: 0.04017673042308015\n"
]
}
],
"source": [
"print(\" Count: {}\".format(stats.count))\n",
"print(\" Mean: {}\".format(stats.mean))\n",
"print(\" Variance: {}\".format(stats.var))\n",
"print(\"Standard Deviation: {}\".format(stats.std))\n",
"print(\" Error on the mean: {}\".format(stats.err))\n",
"print(\" Relative Error: {}\".format(stats.rel_err))"
]
},
{
"cell_type": "markdown",
"id": "545e6fa7-2499-4555-b941-1ddd66919647",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"For performance, {class}`~xyzpy.RunningStatistics` is a ``numba`` compiled class,\n",
"and can also be updated using an iterable very efficiently:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d0109867-1c62-4e24-bcae-44f296854c3d",
"metadata": {},
"outputs": [],
"source": [
"xs = (random.random() for _ in range(10000))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "270be390-7a1c-4341-86d5-2d60177580fd",
"metadata": {},
"outputs": [],
"source": [
"stats.update_from_it(xs)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f9203949-8b8e-42ae-9481-32aa733329a0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10207"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.count"
]
},
{
"cell_type": "markdown",
"id": "48180d84-89a2-4333-8db0-84c037e7d399",
"metadata": {},
"source": [
"The relative error should now be much smaller:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3e7af838-dae0-4923-95d2-fd1469886b76",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.005704083543763922"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.rel_err"
]
},
{
"cell_type": "markdown",
"id": "33532a38-f2c7-4e32-ba52-dfde994e1dd3",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"### Estimating Repeat Quantities\n",
"\n",
"Another common scenario is when you have a function that returns\n",
"a noisy estimate, which you would like to estimate to some\n",
"relative error. The function {func}`~xyzpy.estimate_from_repeats`\n",
"provides this functionality, building on {class}`~xyzpy.RunningStatistics`.\n",
"\n",
"As an example, imagine we want to estimate the sum of ``n`` uniformly distributed\n",
"numbers to a relative error of 0.1%:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4d6f5da0-9f91-4e79-8074-3bf4c7368fcc",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"32432it [00:00, 285928.59it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"RunningStatistics(mean=5.00007(50)e+02, count=32433)\n"
]
}
],
"source": [
"def rand_n_sum(n):\n",
" return np.random.rand(n).sum()\n",
"\n",
"\n",
"stats = xyz.estimate_from_repeats(rand_n_sum, n=1000, rtol=0.0001, verbosity=1)"
]
},
{
"cell_type": "markdown",
"id": "6aaa8021-01cc-4547-a552-630e8190c5de",
"metadata": {},
"source": [
"We can then query the returned ``RunningStatistics`` object:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "d9184618-1191-425a-a219-f8893653971c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(500.00662606906917)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.mean"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "e1d11c41-740f-4f94-9d7b-bdc3010d3725",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"np.float64(0.00010019919813211347)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stats.rel_err"
]
},
{
"cell_type": "markdown",
"id": "2a834feb-f48a-4cad-81db-3bbf73da0669",
"metadata": {},
"source": [
"Which looks as expected."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}