{ "cells": [ { "cell_type": "markdown", "id": "dca085bb-639d-4507-96d2-d117012b1654", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "# Computing Results\n", "\n", "## Parallel Generation - ``num_workers`` and ``pool``\n", "\n", "Running a function for many different parameters theoretically allows perfect parallelization since each run is independent. ``xyzpy`` can automatically handle this in a number of different ways:\n", "\n", "1. Supply ``parallel=True`` when calling ``Runner.run_combos(...)`` or ``Harvester.harvest_combos(...)`` etc. This spawns a ``ProcessExecutorPool`` with the same number of workers as logical cores.\n", "\n", "2. Supply ``num_workers=...`` instead to explicitly control how any workers are used. Since for many numeric codes threading is controlled by the environement variable ``$OMP_NUM_THREADS`` you generally want the product of this and ``num_workers`` to be equal to the number of cores.\n", "\n", "3. Supply ``executor=...`` to use any custom parallel pool-executor like object (e.g. a ``dask.distributed`` client or ``mpi4py`` pool) which has a ``submit``/``apply_async`` method, and yields futures with a ``result``/``get`` method. More specifically, this covers pools with an API matching either ``concurrent.futures`` or an ``ipyparallel`` view. Pools from ``multiprocessing.pool`` are also explicitly handled.\n", "\n", "4. Use a {class}`~xyzpy.Crop` to write combos to disk, which can then be 'grown' persistently by any computers with access to the filesystem, such as distributed cluster - see below.\n", "\n", "\n", "The first three options can be used on any of the various functions derived from {func}`~xyzpy.combo_runner`:" ] }, { "cell_type": "code", "execution_count": 1, "id": "fe942df3-732e-40b6-92b5-09963c1ea84d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|##########| 8/8 [00:01<00:00, 6.53it/s]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
absumdiff
0011-1
1123-1
2235-1
3347-1
4459-1
55611-1
66713-1
77815-1
\n", "
" ], "text/plain": [ " a b sum diff\n", "0 0 1 1 -1\n", "1 1 2 3 -1\n", "2 2 3 5 -1\n", "3 3 4 7 -1\n", "4 4 5 9 -1\n", "5 5 6 11 -1\n", "6 6 7 13 -1\n", "7 7 8 15 -1" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xyzpy as xyz\n", "\n", "\n", "def slow(a, b):\n", " import time\n", "\n", " time.sleep(1)\n", " return a + b, a - b\n", "\n", "\n", "xyz.case_runner_to_df(\n", " slow,\n", " fn_args=[\"a\", \"b\"],\n", " cases=[(i, i + 1) for i in range(8)],\n", " var_names=[\"sum\", \"diff\"],\n", " parallel=True,\n", ")" ] }, { "cell_type": "markdown", "id": "4551ccca-8563-4a8c-9903-eea510301bf9", "metadata": { "raw_mimetype": "text/restructuredtext", "tags": [] }, "source": [ "## Batched / Distributed generation - ``Crop``\n", "\n", "Running combos using the disk as a persistence mechanism requires one more object - the {class}`~xyzpy.Crop`. These can be instantiated directly or, generally more convenient, from a parent ``Runner`` or ``Harvester``:\n", "{meth}`xyzpy.Runner.Crop` and {meth}`xyzpy.Harvester.Crop`. Using a ``Crop`` requires a number of steps:\n", "\n", "1. Creation with:\n", "\n", " * a unique ``name`` to call this set of runs, defaulting the function name\n", " * a ``fn`` if not creating from an ``Harvester`` or ``Runner``\n", " * other optional settings such as ``batchsize`` controlling how many runs to group into one.\n", "\n", ":::{hint}\n", "You can automatically load all crops in the current directory (or a specific one) to a dictionary by calling the function {func}`xyzpy.load_crops`.\n", ":::\n", "\n", "2. **'Sow'**. Use {meth}`xyzpy.Crop.sow_combos` to write ``combos`` into batches on disk.\n", "\n", "3. **'Grow'**. Grow each batch. This can be done a number of ways:\n", "\n", "* The script `xyzpy-grow` can be called from the command line to process all batches. It has signature:" ] }, { "cell_type": "code", "execution_count": 2, "id": "488e6578", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;34musage: \u001b[0m\u001b[1;35mxyzpy-grow\u001b[0m [\u001b[32m-h\u001b[0m] [\u001b[36m--parent-dir \u001b[33mPARENT_DIR\u001b[0m] [\u001b[36m--batch-ids \u001b[33mBATCH_IDS\u001b[0m]\n", " [\u001b[36m--raise-errors \u001b[33m[RAISE_ERRORS]\u001b[0m] [\u001b[36m--num-threads \u001b[33mNUM_THREADS\u001b[0m]\n", " [\u001b[36m--num-workers \u001b[33mNUM_WORKERS\u001b[0m] [\u001b[36m--subprocess \u001b[33m[SUBPROCESS]\u001b[0m]\n", " [\u001b[36m--log \u001b[33m[LOG]\u001b[0m] [\u001b[36m--gpus \u001b[33mGPUS\u001b[0m] [\u001b[36m--affinities \u001b[33mAFFINITIES\u001b[0m]\n", " [\u001b[36m--ray\u001b[0m] [\u001b[36m--gpus-per-task \u001b[33mGPUS_PER_TASK\u001b[0m]\n", " [\u001b[36m--verbosity \u001b[33mVERBOSITY\u001b[0m] [\u001b[36m--verbosity-grow \u001b[33mVERBOSITY_GROW\u001b[0m]\n", " \u001b[32mcrop_name\u001b[0m\n", "\n", "Grow crops using xyzpy-gen-cropping.\n", "\n", "\u001b[1;34mpositional arguments:\u001b[0m\n", " \u001b[1;32mcrop_name\u001b[0m The name of the crop to grow.\n", "\n", "\u001b[1;34moptions:\u001b[0m\n", " \u001b[1;32m-h\u001b[0m, \u001b[1;36m--help\u001b[0m show this help message and exit\n", " \u001b[1;36m--parent-dir\u001b[0m \u001b[1;33mPARENT_DIR\u001b[0m\n", " The parent directory of the crop.\n", " \u001b[1;36m--batch-ids\u001b[0m \u001b[1;33mBATCH_IDS\u001b[0m\n", " Comma separated list of which batches to grow, by\n", " default all missing results.\n", " \u001b[1;36m--raise-errors\u001b[0m \u001b[1;33m[RAISE_ERRORS]\u001b[0m\n", " Raise batch errors.\n", " \u001b[1;36m--num-threads\u001b[0m \u001b[1;33mNUM_THREADS\u001b[0m\n", " The number of threads per worker (OMP_NUM_THREADS\n", " etc.)\n", " \u001b[1;36m--num-workers\u001b[0m \u001b[1;33mNUM_WORKERS\u001b[0m\n", " The number of worker processes to use.\n", " \u001b[1;36m--subprocess\u001b[0m \u001b[1;33m[SUBPROCESS]\u001b[0m\n", " Run each batch in its own fresh subprocess. This is\n", " most robust in terms of memory, at the cost of the\n", " process startup overhead. Optional value: true/false.\n", " \u001b[1;36m--log\u001b[0m \u001b[1;33m[LOG]\u001b[0m Save subprocess stdout and stderr to log files in the\n", " crop directory under logs/batch-{batch_id}.log. Only\n", " used when --subprocess is enabled.\n", " \u001b[1;36m--gpus\u001b[0m \u001b[1;33mGPUS\u001b[0m If subprocess is enabled, this is an optional comma\n", " separated list of GPU device IDs to assign to\n", " subprocesses via CUDA_VISIBLE_DEVICES. Each subprocess\n", " gets a single GPU from this pool; the pool also limits\n", " concurrency. You can oversubscribe GPUs by repeating\n", " device IDs, e.g. `0,0,1,1` to allow 2 subprocesses to\n", " share each GPU.\n", " \u001b[1;36m--affinities\u001b[0m \u001b[1;33mAFFINITIES\u001b[0m\n", " If subprocess is enabled, this is an optional comma\n", " separated list of affinities to use, one for each\n", " process. This ensures a single cpu core is used for\n", " each batch, regardless of other environment variables.\n", " \u001b[1;36m--ray\u001b[0m Use a ray executor, either connecting to an existing\n", " cluster, or starting a new one with num_workers\n", " \u001b[1;36m--gpus-per-task\u001b[0m \u001b[1;33mGPUS_PER_TASK\u001b[0m\n", " The number of gpus to request per task, if using a ray\n", " executor. The overall GPUs available is set by\n", " CUDA_VISIBLE_DEVICES, which ray follows.\n", " \u001b[1;36m--verbosity\u001b[0m \u001b[1;33mVERBOSITY\u001b[0m\n", " The verbosity level.\n", " \u001b[1;36m--verbosity-grow\u001b[0m \u001b[1;33mVERBOSITY_GROW\u001b[0m\n", " The verbosity level.\n" ] } ], "source": [ "!xyzpy-grow --help" ] }, { "cell_type": "markdown", "id": "27714fd0", "metadata": {}, "source": [ "* In another python process, navigate to the same directory and run for example ``python -c \"import xyzpy; c = xyzpy.Crop(name=...); xyzpy.grow(i, crop=c)\"`` to grow the ith batch of crop with specified name. See {func}`~xyzpy.grow` for other options. This could manually be put in a script to run on a batch system.\n", "\n", "* Use [`Crop.grow_cluster`](xyzpy.gen.cropping.grow_cluster) - experimental! This automatically generates and submits a script using SGE, PBS or SLURM. See its options and [`Crop.gen_cluster_script`](xyzpy.gen.cropping.gen_cluster_script) for the template scripts.\n", "\n", "* Use {meth}`xyzpy.Crop.grow` or {meth}`xyzpy.Crop.grow_missing` to complete some or all of the batches in the local process. This can be useful to A. finish up a few missing/errored runs B. run all the combos with persistent progress, so that one can restart the runs at a completely different time/ with updated functions etc.\n", "\n", "4. Watch the progress. ``Crop.__str__`` will show how many batches have been completed of the total sown.\n", "\n", "5. **'Reap'**. Once all the batches have completed, run {meth}`xyzpy.Crop.reap` to collect the results and remove the batches' temporary directory. If the crop originated from a ``Runner`` or ``Harvester``, the data will be labelled, merged and saved accordingly.\n", "\n", ":::{note}\n", "You can reap an unfinished ``Crop`` as long as there is at least one result by passing the ``allow_incomplete=True`` option to {meth}`~xyzpy.Crop.reap`. Note that missing results will be represented by ``numpy.nan`` which might effect the eventual ``dtype`` of harvested results. To avoid this, consider also setting ``sync=False`` to avoid writing anything to disk until the full ``Crop`` is finished.\n", ":::\n", "\n", "See the full demonstrations in {ref}`Examples`." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3" } }, "nbformat": 4, "nbformat_minor": 5 }