{ "cells": [ { "cell_type": "markdown", "id": "fbe37a9b-976e-4fb9-ab7a-62d1536bfed0", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "# Specifying Inputs & Outputs\n", "\n", "The idea of [`xyzpy`](xyzpy) is to ease the some of the pain generating data with a large parameter space.\n", "The central aim being that, once you know what a single run of a function looks like, it should be as easy as saying, \"run these combinations of parameters, now run these particular cases\" with everything automatically aggregated into a fully self-described dataset." ] }, { "cell_type": "code", "execution_count": 1, "id": "23b27d56-726a-4390-852e-63d10a480c00", "metadata": {}, "outputs": [], "source": [ "%config InlineBackend.figure_formats = ['svg']\n", "\n", "import numpy as np\n", "\n", "import xyzpy as xyz" ] }, { "cell_type": "markdown", "id": "38a8074e-cd2f-4736-8f15-d37f05f62603", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "## Combos & Cases\n", "\n", "The main backend function is [`xyz.combo_runner`](xyzpy.combo_runner), which in its simplest form takes a function, say:" ] }, { "cell_type": "code", "execution_count": 2, "id": "d60d8306-c15c-4189-a888-d4b976a6ba0e", "metadata": {}, "outputs": [], "source": [ "def foo(a, b, c):\n", " return f\"{a}-{b}-{c}\", np.sin(a)" ] }, { "cell_type": "markdown", "id": "58ca1ee8-2bc5-447e-890d-932170911e9d", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "and ``combos`` of the form:" ] }, { "cell_type": "code", "execution_count": 3, "id": "943d0f3c-06ab-4a4f-9871-63470b65db96", "metadata": {}, "outputs": [], "source": [ "combos = [\n", " (\"a\", [1, 2, 3]),\n", " (\"b\", [\"x\", \"y\", \"z\"]),\n", " (\"c\", [True, False]),\n", "]" ] }, { "cell_type": "markdown", "id": "df476929-7ca5-45c3-8242-c3c45510476a", "metadata": {}, "source": [ "and generates a nested (here 3 dimensional) array of all the outputs of ``foo`` with the ``3 * 3 * 2 = 18`` combinations of input arguments:" ] }, { "cell_type": "code", "execution_count": 4, "id": "0835dcf1-d969-4d58-9acd-01cfa28e58d5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 18/18 [00:00<00:00, 56090.25it/s]\n" ] }, { "data": { "text/plain": [ "(((('1-x-True', np.float64(0.8414709848078965)),\n", " ('1-x-False', np.float64(0.8414709848078965))),\n", " (('1-y-True', np.float64(0.8414709848078965)),\n", " ('1-y-False', np.float64(0.8414709848078965))),\n", " (('1-z-True', np.float64(0.8414709848078965)),\n", " ('1-z-False', np.float64(0.8414709848078965)))),\n", " ((('2-x-True', np.float64(0.9092974268256817)),\n", " ('2-x-False', np.float64(0.9092974268256817))),\n", " (('2-y-True', np.float64(0.9092974268256817)),\n", " ('2-y-False', np.float64(0.9092974268256817))),\n", " (('2-z-True', np.float64(0.9092974268256817)),\n", " ('2-z-False', np.float64(0.9092974268256817)))),\n", " ((('3-x-True', np.float64(0.1411200080598672)),\n", " ('3-x-False', np.float64(0.1411200080598672))),\n", " (('3-y-True', np.float64(0.1411200080598672)),\n", " ('3-y-False', np.float64(0.1411200080598672))),\n", " (('3-z-True', np.float64(0.1411200080598672)),\n", " ('3-z-False', np.float64(0.1411200080598672)))))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xyz.combo_runner(foo, combos)" ] }, { "cell_type": "markdown", "id": "91fc38e2-5015-47cc-abf6-049e2b633c5a", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Note the progress bar shown. If the function was slower (generally the target case for ``xyzpy``), this would show the remaining time before completion.\n", "\n", "There is also [`xyz.case_runner`](xyzpy.case_runner) for running isolated cases:" ] }, { "cell_type": "code", "execution_count": 5, "id": "ffa19e4b-f228-41e4-83c8-02dd5b54e596", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 2/2 [00:00<00:00, 22982.49it/s]\n" ] }, { "data": { "text/plain": [ "(('4-z-False', np.float64(-0.7568024953079283)),\n", " ('5-y-True', np.float64(-0.9589242746631385)))" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cases = [(4, \"z\", False), (5, \"y\", True)]\n", "xyz.case_runner(foo, fn_args=(\"a\", \"b\", \"c\"), cases=cases)" ] }, { "cell_type": "markdown", "id": "de9bc9d2-1d51-47a5-b581-135006342e02", "metadata": {}, "source": [ "You can also mix the two, supplying some function arguments as ``cases`` and some as ``combos``.\n", "In this situation, **for each case, all sub combinations are run**:" ] }, { "cell_type": "code", "execution_count": 6, "id": "0f6a44c3-2aa8-45b3-8deb-87427ceab4b8", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 9/9 [00:00<00:00, 97541.95it/s]\n" ] }, { "data": { "text/plain": [ "((((array(nan), array(nan)),\n", " (array(nan), array(nan)),\n", " (array(nan), array(nan))),\n", " (('1-x-True', np.float64(0.8414709848078965)),\n", " ('1-y-True', np.float64(0.8414709848078965)),\n", " ('1-z-True', np.float64(0.8414709848078965)))),\n", " ((('2-x-False', np.float64(0.9092974268256817)),\n", " ('2-y-False', np.float64(0.9092974268256817)),\n", " ('2-z-False', np.float64(0.9092974268256817))),\n", " ((array(nan), array(nan)),\n", " (array(nan), array(nan)),\n", " (array(nan), array(nan)))),\n", " (((array(nan), array(nan)),\n", " (array(nan), array(nan)),\n", " (array(nan), array(nan))),\n", " (('3-x-True', np.float64(0.1411200080598672)),\n", " ('3-y-True', np.float64(0.1411200080598672)),\n", " ('3-z-True', np.float64(0.1411200080598672)))))" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xyz.combo_runner(\n", " foo,\n", " cases=[\n", " {\"a\": 1, \"c\": True},\n", " {\"a\": 2, \"c\": False},\n", " {\"a\": 3, \"c\": True},\n", " ],\n", " combos={\n", " \"b\": [\"x\", \"y\", \"z\"],\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "5ed9c89a-7a5f-4138-8a4f-916c9e31be86", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Note now that for the ``combo_runner`` missing results are automatically filled with ``nan``, (or possibly ``None`` depending on shape and dtype).\n", "Note we also avoided specifying the specific function argument order by supplying a list of dicts.\n", "You can supply both ``combos`` and ``cases`` to either [`combo_runner`](xyzpy.combo_runner), or [`case_runner`](xyzpy.case_runner), the main difference is\n", "\n", "1. [`combo_runner`](xyzpy.combo_runner) outputs a nested tuple suitable to be turned into an array\n", "2. [`case_runner`](xyzpy.case_runner) outputs a flat tuple of results suitable to be put into a table\n", "\n", "You will likely not use these functions in their raw form, but they illustrate the concept of ``combos`` and ``cases`` and underly most other functionality.\n", "\n", "\n", "## Describing the function - ``Runner``\n", "\n", "To automatically put the generated data into a labelled {class}`xarray.Dataset` you need to describe your function using the [`xyz.Runner`](xyzpy.Runner) class. In the simplest case this is just a matter of naming the outputs:" ] }, { "cell_type": "code", "execution_count": 7, "id": "870cb4d2-1b27-4a56-8879-4d7819e72924", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 18/18 [00:00<00:00, 67650.06it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 830B\n",
       "Dimensions:  (a: 3, b: 3, c: 2)\n",
       "Coordinates:\n",
       "  * a        (a) int64 24B 1 2 3\n",
       "  * b        (b) <U1 12B 'x' 'y' 'z'\n",
       "  * c        (c) bool 2B True False\n",
       "Data variables:\n",
       "    a_out    (a, b, c) <U9 648B '1-x-True' '1-x-False' ... '3-z-False'\n",
       "    b_out    (a, b, c) float64 144B 0.8415 0.8415 0.8415 ... 0.1411 0.1411
" ], "text/plain": [ " Size: 830B\n", "Dimensions: (a: 3, b: 3, c: 2)\n", "Coordinates:\n", " * a (a) int64 24B 1 2 3\n", " * b (b) (A, B[x], C[x, t])\"\n", "```\n", "\n", "Maybe ``i, j, k`` index a location and ``t`` is a (constant) series of times to compute. There are 3 outputs: (i) the scalar ``A``, (ii) the vector ``B`` which has a dimension ``x`` with known coordinates, say ``[10, 20, 30]``, and (iii) the 2D-array ``C``, which shares the ``x`` dimension but also depends on ``t``. The arguments to [`Runner`](xyzpy.Runner) to describe this situation would be:" ] }, { "cell_type": "code", "execution_count": 8, "id": "41295112-31bb-40c6-b755-bc1251842711", "metadata": {}, "outputs": [], "source": [ "var_names = [\"A\", \"B\", \"C\"]\n", "var_dims = {\"B\": [\"x\"], \"C\": [\"x\", \"t\"]}\n", "var_coords = {\"x\": [10, 20, 30]}\n", "constants = {\"t\": np.linspace(0, 1, 101)}" ] }, { "cell_type": "markdown", "id": "7eb73429-6d86-47d9-97ec-8198a5ebc06d", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Note that ``'t'`` doesn't need to be specified in ``var_coords`` as it can be found in ``constants``. Let's explicitly mock a function with this signature and some combos to run:" ] }, { "cell_type": "code", "execution_count": 9, "id": "ca80e809-7133-4f0e-8431-63b747a93478", "metadata": {}, "outputs": [], "source": [ "def bar(i, j, k, t):\n", " A = np.random.rand()\n", " B = np.random.rand(3) # 'B[x]'\n", " C = np.random.rand(3, len(t)) # 'C[x, t]'\n", " return A, B, C\n", "\n", "\n", "# if we are using a runner, combos can be supplied as a dict\n", "combos = {\n", " \"i\": [5, 6, 7],\n", " \"j\": [0.5, 0.6, 0.7],\n", " \"k\": [0.05, 0.06, 0.07],\n", "}" ] }, { "cell_type": "markdown", "id": "ed975119-a06b-4639-b503-b92a6681d4ad", "metadata": {}, "source": [ "We can then run the combos:" ] }, { "cell_type": "code", "execution_count": 10, "id": "cdf635c3-13af-4c17-a3ab-9d10b9b5ec98", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 27/27 [00:00<00:00, 6761.78it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 67kB\n",
       "Dimensions:  (i: 3, j: 3, k: 3, x: 3, t: 101)\n",
       "Coordinates:\n",
       "  * i        (i) int64 24B 5 6 7\n",
       "  * j        (j) float64 24B 0.5 0.6 0.7\n",
       "  * k        (k) float64 24B 0.05 0.06 0.07\n",
       "  * x        (x) int64 24B 10 20 30\n",
       "  * t        (t) float64 808B 0.0 0.01 0.02 0.03 0.04 ... 0.97 0.98 0.99 1.0\n",
       "Data variables:\n",
       "    A        (i, j, k) float64 216B 0.3676 0.9331 0.4158 ... 0.3203 0.2549\n",
       "    B        (i, j, k, x) float64 648B 0.6944 0.08939 0.137 ... 0.9456 0.3549\n",
       "    C        (i, j, k, x, t) float64 65kB 0.8575 0.05207 ... 0.8448 0.08909
" ], "text/plain": [ " Size: 67kB\n", "Dimensions: (i: 3, j: 3, k: 3, x: 3, t: 101)\n", "Coordinates:\n", " * i (i) int64 24B 5 6 7\n", " * j (j) float64 24B 0.5 0.6 0.7\n", " * k (k) float64 24B 0.05 0.06 0.07\n", " * x (x) int64 24B 10 20 30\n", " * t (t) float64 808B 0.0 0.01 0.02 0.03 0.04 ... 0.97 0.98 0.99 1.0\n", "Data variables:\n", " A (i, j, k) float64 216B 0.3676 0.9331 0.4158 ... 0.3203 0.2549\n", " B (i, j, k, x) float64 648B 0.6944 0.08939 0.137 ... 0.9456 0.3549\n", " C (i, j, k, x, t) float64 65kB 0.8575 0.05207 ... 0.8448 0.08909" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r = xyz.Runner(\n", " bar,\n", " constants=constants,\n", " var_names=var_names,\n", " var_coords=var_coords,\n", " var_dims=var_dims,\n", ")\n", "r.run_combos(combos)" ] }, { "cell_type": "markdown", "id": "480722a0-e3ed-4701-bfe1-b732bc62846f", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can see the dimensions ``'i'``, ``'j'`` and ``'k'`` have been generated by the combos for all variables, as well as the 'internal' dimensions ``'x'`` and ``'t'`` only for ``'B'`` and ``'C'``.\n", "See also the :ref:`Structured Output with Julia Set Example` for a fuller demonstration.\n", "\n", "Finally, if the function itself returns a `dict` or {class}`xarray.Dataset`, then just use ``var_names=None`` and all the outputs will be concatenated together automatically. The overhead this incurs is often negligible for anything but very fast functions.\n", "\n", "\n", "Aggregating data - ``Harvester``\n", "--------------------------------\n", "\n", "A common scenario when running simulations is the following:\n", "\n", "1. Generate some data\n", "2. Save it to disk\n", "3. Generate a different set of data (maybe after analysis of the first set)\n", "4. Load the old data\n", "5. Merge the new data with the old data\n", "6. Save the new combined data\n", "7. Repeat\n", "\n", "The aim of the {class}`~xyzpy.Harvester` is to automate that process. A {class}`~xyzpy.Harvester` is instantiated with a {class}`~xyzpy.Runner` instance and, optionally, a ``data_name``. If a ``data_name`` is given, then every time a round of combos/cases is generated, it will be automatically synced with a on-disk dataset of that name. Either way, the harvester will aggregate all runs into the ``full_ds`` attribute." ] }, { "cell_type": "code", "execution_count": 11, "id": "b4ab86fe-1acf-4b7f-942a-fc2673168c23", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 18/18 [00:00<00:00, 141646.29it/s]\n" ] } ], "source": [ "combos = [\n", " (\"a\", [1, 2, 3]),\n", " (\"b\", [\"x\", \"y\", \"z\"]),\n", " (\"c\", [True, False]),\n", "]\n", "\n", "harvester = xyz.Harvester(runner, data_name=\"foo.h5\")\n", "harvester.harvest_combos(combos)" ] }, { "cell_type": "markdown", "id": "72a01a59-58cf-4a9d-91b3-b02da4649058", "metadata": {}, "source": [ "Which, because it didn't exist yet, created the file ``data_name``:" ] }, { "cell_type": "code", "execution_count": 12, "id": "214c0b5f-6d3b-4e97-86c1-7211543ae570", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "foo.h5\n" ] } ], "source": [ "!ls *.h5" ] }, { "cell_type": "markdown", "id": "8119b1c7-c076-4884-936b-85d722909c89", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "{meth}`xyzpy.Harvester.harvest_combos` calls {meth}`xyzpy.Runner.run_combos` itself - this doesn't need to be done seperately.\n", "\n", "Now we can run a second set of different combos:" ] }, { "cell_type": "code", "execution_count": 13, "id": "05c6ca30-1dc2-4ef0-847f-f48a51f2bd69", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 12/12 [00:00<00:00, 33599.23it/s]\n" ] } ], "source": [ "combos2 = {\n", " \"a\": [4, 5, 6],\n", " \"b\": [\"w\", \"v\"],\n", " \"c\": [True, False],\n", "}\n", "harvester.harvest_combos(combos2)" ] }, { "cell_type": "markdown", "id": "56e95df7-3b54-4df7-8428-ac796515ea97", "metadata": {}, "source": [ "Now we can check the total dataset containing all combos and cases run so far:" ] }, { "cell_type": "code", "execution_count": 14, "id": "ff657955-4fe6-43cb-81bf-c770f5f396e5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 1kB\n",
       "Dimensions:  (a: 6, b: 5, c: 2)\n",
       "Coordinates:\n",
       "  * a        (a) int64 48B 1 2 3 4 5 6\n",
       "  * b        (b) <U1 20B 'v' 'w' 'x' 'y' 'z'\n",
       "  * c        (c) bool 2B True False\n",
       "Data variables:\n",
       "    a_out    (a, b, c) object 480B nan nan nan nan ... nan nan nan nan\n",
       "    b_out    (a, b, c) float64 480B nan nan nan nan 0.8415 ... nan nan nan nan
" ], "text/plain": [ " Size: 1kB\n", "Dimensions: (a: 6, b: 5, c: 2)\n", "Coordinates:\n", " * a (a) int64 48B 1 2 3 4 5 6\n", " * b (b) \n", "Runner: \n", " fn: \n", " fn_args: ('x', 'y')\n", " var_names: ('sum', 'diff')\n", " var_dims: {'sum': (), 'diff': ()}\n", "Sync file -->\n", " foo.h5 [h5netcdf]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@xyz.label(var_names=[\"sum\", \"diff\"], harvester=\"foo.h5\")\n", "def foo(x, y):\n", " return x + y, x - y\n", "\n", "\n", "foo" ] }, { "cell_type": "markdown", "id": "868f1839-3e24-47c8-b0fd-f9c6e02ff3a0", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Note that, since the different runs were disjoint, missing values have automatically been filled in with ``nan`` values - see {func}`xarray.merge`. The on-disk dataset now contains both runs.\n", "\n", ":::{hint}\n", "\n", "As a convenience, {func}`~xyzpy.label` can also be used to decorate a function as a {class}`xyzpy.Harvester`\n", "by supplying the ``harvester`` kwarg. If ``True`` a harvester will be instantiated with ``data_name=None``.\n", "If a string, it is used as the ``data_name``.\n", "\n", "```python\n", ">>> @label(var_names=['sum', 'diff'], harvester='foo.h5')\n", "... def foo(x, y):\n", "... return x + y, x - y\n", "...\n", ">>> foo\n", "\n", "Runner: \n", " fn: \n", " fn_args: ('x', 'y')\n", " var_names: ('sum', 'diff')\n", " var_dims: {'sum': (), 'diff': ()}\n", "Sync file -->\n", " foo.h5 [h5netcdf]\n", "```\n", ":::\n", "\n", "## Aggregating Random samples of data - ``Sampler``\n", "\n", "Occasionally, exhaustively iterating through all combinations of arguments is unneccesary. If instead you just want to sample the parameter space sparsely then the {class}`~xyzpy.Sampler` object allows this with much the same interface as a {class}`~xyzpy.Harvester`. The main difference is that, since the parameters are no longer gridded, the data is stored as a table in a\n", "{class}`pandas.DataFrame`." ] }, { "cell_type": "code", "execution_count": 16, "id": "5571c04d-4214-4a54-87cb-4d0b2f6e8314", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "Runner: \n", " fn: \n", " fn_args: ('amp', 'fn', 'x', 'phase')\n", " var_names: ('out',)\n", " var_dims: {'out': ()}\n", "Sync file -->\n", " trig.pkl [pickle]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import math\n", "import random\n", "\n", "\n", "@xyz.label(var_names=[\"out\"])\n", "def trig(amp, fn, x, phase):\n", " return amp * getattr(math, fn)(x - phase)\n", "\n", "\n", "# these are the default combos/distributions to sample from\n", "default_combos = {\n", " \"amp\": [1, 2, 3],\n", " \"fn\": [\"cos\", \"sin\"],\n", " # for distributions we can supply callables\n", " \"x\": lambda: 2 * math.pi * random.random(),\n", " \"phase\": lambda: random.gauss(0.0, 0.1),\n", "}\n", "\n", "sampler = xyz.Sampler(trig, \"trig.pkl\", default_combos)\n", "sampler" ] }, { "cell_type": "markdown", "id": "fdec0272-2ec4-40da-a872-d54012f9c469", "metadata": {}, "source": [ "Now we can run the sampler many times (and supply any of the usual arguments such as ``parallel=True`` etc). This generates a {class}`pandas.DataFrame`:" ] }, { "cell_type": "code", "execution_count": 17, "id": "ff8f9f2d-8c2f-45ea-bc88-041032229e18", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 10000/10000 [00:00<00:00, 1517695.76it/s]\n" ] } ], "source": [ "sampler.sample_combos(10000);" ] }, { "cell_type": "markdown", "id": "476bc80f-6462-4197-8e6d-fd83d77b2dd1", "metadata": {}, "source": [ "This has also synced the data with the on-disk file:" ] }, { "cell_type": "code", "execution_count": 18, "id": "c2d677d9-cc7d-4301-9651-6ae9490eab41", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "trig.pkl\n" ] } ], "source": [ "!ls *.pkl" ] }, { "cell_type": "markdown", "id": "d531b48c-172e-4eb5-abca-4af2922f85f2", "metadata": {}, "source": [ "You can specify ``Sampler(..., engine='csv')`` etc to use formats other than ``pickle``." ] }, { "cell_type": "markdown", "id": "73cc5b27-6c39-4090-a4d4-cb4b59f9e2df", "metadata": {}, "source": [ "As with the ``Harvester``, next time we run combinations, the data is automatically\n", "aggregated into the full set:" ] }, { "cell_type": "code", "execution_count": 19, "id": "d8acc735-9d57-42b6-82ef-32a9470028fd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 5000/5000 [00:00<00:00, 1363912.59it/s]\n" ] } ], "source": [ "# here we will override some of the default sampling choices\n", "combos = {\"fn\": [\"tan\"], \"x\": lambda: random.random() * math.pi / 4}\n", "\n", "sampler.sample_combos(5000, combos);" ] }, { "cell_type": "markdown", "id": "d16d7f14-880e-4eae-b1b0-6b41586230a1", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "We can then use tools such as [`seaborn`](https://seaborn.pydata.org) to visualize the full data:" ] }, { "cell_type": "code", "execution_count": 20, "id": "a54fe8c4-4499-457d-a9e5-f2b865061079", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/svg+xml": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "\n", "sns.relplot(x=\"x\", y=\"out\", hue=\"fn\", size=\"amp\", data=sampler.full_df)" ] }, { "cell_type": "markdown", "id": "97b3706f-9af7-4910-a424-b72d6eb6eabe", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ ":::{hint}\n", "As a convenience, {func}`xyzpy.label` can also be used to decorate a function as a {class}`xyzpy.Sampler`\n", "by supplying the ``sampler`` kwarg. If ``True`` a sampler will be instantiated with ``data_name=None``.\n", "If a string, it is used as the ``data_name``.\n", ":::\n", "\n", "\n", "Summary\n", "-------\n", "\n", " 1. {func}`~xyzpy.combo_runner` is the core function which outputs a nested tuple and contains the parallelization logic and progress display etc.\n", "\n", " 2. {class}`~xyzpy.Runner` and {meth}`~xyzpy.Runner.run_combos` are used to describe the function's output and perform a single set of runs yielding a {class}`xarray.Dataset`. These internally call {func}`~xyzpy.combo_runner`.\n", "\n", " 3. {class}`~xyzpy.Harvester` and {meth}`~xyzpy.Harvester.harvest_combos` are used to perform many sets of runs, continuously merging the results into one larger {class}`xarray.Dataset` - ``Harvester.full_ds``, probably synced to disk. These internally call {meth}`~xyzpy.Runner.run_combos`.\n", "\n", " 4. {class}`~xyzpy.Sampler` and {meth}`~xyzpy.Sampler.sample_combos` are used to sparsely sample\n", " from parameter combinations. Unlike a normal ``Harvester``, the data is aggregated\n", " automatically into a ``pandas.DataFrame``.\n", "\n", "In general, you would only generate data with one of these methods at once - see the full demonstrations in [Examples](examples)." ] }, { "cell_type": "code", "execution_count": 21, "id": "63033245-a46b-4074-8c3c-de1f501502fa", "metadata": {}, "outputs": [], "source": [ "# some cleanup\n", "harvester.delete_ds()\n", "sampler.delete_df()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3" } }, "nbformat": 4, "nbformat_minor": 4 }