Computing Results¶
Parallel Generation - num_workers and pool¶
Running a function for many different parameters theoretically allows perfect parallelization since each run is independent. xyzpy can automatically handle this in a number of different ways:
Supply
parallel=Truewhen callingRunner.run_combos(...)orHarvester.harvest_combos(...)etc. This spawns aProcessExecutorPoolwith the same number of workers as logical cores.Supply
num_workers=...instead to explicitly control how any workers are used. Since for many numeric codes threading is controlled by the environement variable$OMP_NUM_THREADSyou generally want the product of this andnum_workersto be equal to the number of cores.Supply
executor=...to use any custom parallel pool-executor like object (e.g. adask.distributedclient ormpi4pypool) which has asubmit/apply_asyncmethod, and yields futures with aresult/getmethod. More specifically, this covers pools with an API matching eitherconcurrent.futuresor anipyparallelview. Pools frommultiprocessing.poolare also explicitly handled.Use a
Cropto write combos to disk, which can then be ‘grown’ persistently by any computers with access to the filesystem, such as distributed cluster - see below.
The first three options can be used on any of the various functions derived from combo_runner():
import xyzpy as xyz
def slow(a, b):
import time
time.sleep(1)
return a + b, a - b
xyz.case_runner_to_df(
slow,
fn_args=["a", "b"],
cases=[(i, i + 1) for i in range(8)],
var_names=["sum", "diff"],
parallel=True,
)
100%|##########| 8/8 [00:01<00:00, 6.53it/s]
| a | b | sum | diff | |
|---|---|---|---|---|
| 0 | 0 | 1 | 1 | -1 |
| 1 | 1 | 2 | 3 | -1 |
| 2 | 2 | 3 | 5 | -1 |
| 3 | 3 | 4 | 7 | -1 |
| 4 | 4 | 5 | 9 | -1 |
| 5 | 5 | 6 | 11 | -1 |
| 6 | 6 | 7 | 13 | -1 |
| 7 | 7 | 8 | 15 | -1 |
Batched / Distributed generation - Crop¶
Running combos using the disk as a persistence mechanism requires one more object - the Crop. These can be instantiated directly or, generally more convenient, from a parent Runner or Harvester:
xyzpy.Runner.Crop() and xyzpy.Harvester.Crop(). Using a Crop requires a number of steps:
Creation with:
a unique
nameto call this set of runs, defaulting the function namea
fnif not creating from anHarvesterorRunnerother optional settings such as
batchsizecontrolling how many runs to group into one.
Hint
You can automatically load all crops in the current directory (or a specific one) to a dictionary by calling the function xyzpy.load_crops().
‘Sow’. Use
xyzpy.Crop.sow_combos()to writecombosinto batches on disk.‘Grow’. Grow each batch. This can be done a number of ways:
The script
xyzpy-growcan be called from the command line to process all batches. It has signature:
!xyzpy-grow --help
usage: xyzpy-grow [-h] [--parent-dir PARENT_DIR] [--batch-ids BATCH_IDS]
[--raise-errors [RAISE_ERRORS]] [--num-threads NUM_THREADS]
[--num-workers NUM_WORKERS] [--subprocess [SUBPROCESS]]
[--log [LOG]] [--gpus GPUS] [--affinities AFFINITIES]
[--ray] [--gpus-per-task GPUS_PER_TASK]
[--verbosity VERBOSITY] [--verbosity-grow VERBOSITY_GROW]
crop_name
Grow crops using xyzpy-gen-cropping.
positional arguments:
crop_name The name of the crop to grow.
options:
-h, --help show this help message and exit
--parent-dir PARENT_DIR
The parent directory of the crop.
--batch-ids BATCH_IDS
Comma separated list of which batches to grow, by
default all missing results.
--raise-errors [RAISE_ERRORS]
Raise batch errors.
--num-threads NUM_THREADS
The number of threads per worker (OMP_NUM_THREADS
etc.)
--num-workers NUM_WORKERS
The number of worker processes to use.
--subprocess [SUBPROCESS]
Run each batch in its own fresh subprocess. This is
most robust in terms of memory, at the cost of the
process startup overhead. Optional value: true/false.
--log [LOG] Save subprocess stdout and stderr to log files in the
crop directory under logs/batch-{batch_id}.log. Only
used when --subprocess is enabled.
--gpus GPUS If subprocess is enabled, this is an optional comma
separated list of GPU device IDs to assign to
subprocesses via CUDA_VISIBLE_DEVICES. Each subprocess
gets a single GPU from this pool; the pool also limits
concurrency. You can oversubscribe GPUs by repeating
device IDs, e.g. `0,0,1,1` to allow 2 subprocesses to
share each GPU.
--affinities AFFINITIES
If subprocess is enabled, this is an optional comma
separated list of affinities to use, one for each
process. This ensures a single cpu core is used for
each batch, regardless of other environment variables.
--ray Use a ray executor, either connecting to an existing
cluster, or starting a new one with num_workers
--gpus-per-task GPUS_PER_TASK
The number of gpus to request per task, if using a ray
executor. The overall GPUs available is set by
CUDA_VISIBLE_DEVICES, which ray follows.
--verbosity VERBOSITY
The verbosity level.
--verbosity-grow VERBOSITY_GROW
The verbosity level.
In another python process, navigate to the same directory and run for example
python -c "import xyzpy; c = xyzpy.Crop(name=...); xyzpy.grow(i, crop=c)"to grow the ith batch of crop with specified name. Seegrow()for other options. This could manually be put in a script to run on a batch system.Use
Crop.grow_cluster- experimental! This automatically generates and submits a script using SGE, PBS or SLURM. See its options andCrop.gen_cluster_scriptfor the template scripts.Use
xyzpy.Crop.grow()orxyzpy.Crop.grow_missing()to complete some or all of the batches in the local process. This can be useful to A. finish up a few missing/errored runs B. run all the combos with persistent progress, so that one can restart the runs at a completely different time/ with updated functions etc.
Watch the progress.
Crop.__str__will show how many batches have been completed of the total sown.‘Reap’. Once all the batches have completed, run
xyzpy.Crop.reap()to collect the results and remove the batches’ temporary directory. If the crop originated from aRunnerorHarvester, the data will be labelled, merged and saved accordingly.
Note
You can reap an unfinished Crop as long as there is at least one result by passing the allow_incomplete=True option to reap(). Note that missing results will be represented by numpy.nan which might effect the eventual dtype of harvested results. To avoid this, consider also setting sync=False to avoid writing anything to disk until the full Crop is finished.
See the full demonstrations in Examples.