xyzpy.gen.cropping¶
Attributes¶
Classes¶
A pool of reusable resource IDs (CPUs, GPUs, etc.) that can be |
|
Encapsulates all the details describing a single 'crop', that is, |
|
Class for sowing a 'crop' of batched combos to then 'grow' (on any |
|
Class that acts as a stateful function to retrieve already sown and |
Functions¶
|
|
|
|
|
|
|
|
|
|
|
Work out how to structure the sowed data. |
|
|
|
Logic for choosing whether to automatically clean up a crop, and what, |
|
|
|
Normalize an int, list, tuple, range, or comma-separated string |
|
Prepend |
|
Set |
|
Automatically load all the crops found in the current directory. |
|
Automatically process a batch of cases into results. Should be run in an |
|
Generate a cluster script to grow a Crop. |
|
Automagically submit SGE, PBS, or slurm jobs to grow all missing |
|
Generate a qsub script to grow a Crop. Deprecated in favor of |
|
Automagically submit SGE or PBS jobs to grow all missing results. |
|
|
|
Module Contents¶
- xyzpy.gen.cropping.BTCH_NM = 'xyz-batch-{}.jbdmp'¶
- xyzpy.gen.cropping.RSLT_NM = 'xyz-result-{}.jbdmp'¶
- xyzpy.gen.cropping.FNCT_NM = 'xyz-function.clpkl'¶
- xyzpy.gen.cropping.INFO_NM = 'xyz-settings.jbdmp'¶
- xyzpy.gen.cropping.parse_crop_details(fn, crop_name, crop_parent)[source]¶
Work out how to structure the sowed data.
- Parameters:
- Returns:
crop_location (str) – Full path to the crop-folder.
crop_name (str) – Name of the crop.
crop_parent (str) – Parent folder of the crop.
- xyzpy.gen.cropping.calc_clean_up_default_res(crop, clean_up, allow_incomplete)[source]¶
Logic for choosing whether to automatically clean up a crop, and what, if any, the default all-nan result should be.
- xyzpy.gen.cropping._parse_resource_ids(raw)[source]¶
Normalize an int, list, tuple, range, or comma-separated string into a list of integer resource IDs.
- xyzpy.gen.cropping._acquire_affinity(rid, pargs, env)[source]¶
Prepend
taskset -c <cpu>to pin to a CPU core.
- class xyzpy.gen.cropping._ResourcePool(ids, acquire_fn)[source]¶
A pool of reusable resource IDs (CPUs, GPUs, etc.) that can be acquired and released once per batch subprocess.
- Parameters:
- free¶
- used¶
- acquire_fn¶
- class xyzpy.gen.cropping.Crop(*, fn=None, name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None, shuffle=False, farmer=None, autoload=True)[source]¶
Bases:
objectEncapsulates all the details describing a single ‘crop’, that is, its location, name, and batch size/number. Also allows tracking of crop’s progress, and experimentally, automatic submission of workers to grid engine to complete un-grown cases. Can also be instantiated directly from a
RunnerorHarvesterorCropinstance.- Parameters:
fn (callable, optional) – Target function - Crop name will be inferred from this if not given explicitly. If given, Sower will also default to saving a version of fn to disk for cropping.grow to use.
name (str, optional) – Custom name for this set of runs - must be given if fn is not.
parent_dir (str, optional) – If given, alternative directory to put the “.xyz-{name}/” folder in with all the cases and results.
save_fn (bool, optional) – Whether to save the function to disk for cropping.grow to use. Will default to True if fn is given.
batchsize (int, optional) – How many cases to group into a single batch per worker. By default, batchsize=1. Cannot be specified if num_batches is.
num_batches (int, optional) – How many total batches to aim for, cannot be specified if batchsize is.
farmer ({xyzpy.Runner, xyzpy.Harvester, xyzpy.Sampler}, optional) – A Runner, Harvester or Sampler, instance, from which the fn can be inferred and which can also allow the Crop to reap itself straight to a dataset or dataframe.
autoload (bool, optional) – If True, check for the existence of a Crop written to disk with the same location, and if found, load it.
See also
Runner.Crop,Harvester.Crop,Sampler.Crop- name = None¶
- parent_dir = None¶
- save_fn = None¶
- batchsize = None¶
- num_batches = None¶
- shuffle = False¶
- _batch_remainder = None¶
- _all_nan_result = None¶
- _num_sown_batches = -1¶
- _num_results = -1¶
- property runner¶
- choose_batch_settings(*, combos=None, cases=None)[source]¶
Work out how to divide all cases into batches, i.e. ensure that
batchsize * num_batches >= num_cases.
- load_function()[source]¶
Load the saved function from disk, and try to re-insert it back into Harvester or Runner if present.
- prepare(combos=None, cases=None, fn_args=None)[source]¶
Write information about this crop and the supplied combos to disk. Typically done at start of sow, not when Crop instantiated.
- completed_results() tuple[int, Ellipsis][source]¶
Return tuple of batches which have been grown already.
- missing_results() tuple[int, Ellipsis][source]¶
Return tuple of batches which haven’t been grown yet.
- delete_all()[source]¶
Delete the crop directory and all its contents, and reset any loaded information on this Crop object.
- handle_existing(action='ask', msg=None, e=None, overwrite=False)[source]¶
Handle an already prepared crop.
- Parameters:
action ({'ask', 'reap', 'delete', 'skip', 'raise'}) – What to do with the existing crop. If
'ask'(default), interactively prompt the user. Otherwise, execute the specified action directly.msg (str, optional) – Message to display when prompting.
e (Exception, optional) – Exception to re-raise if action is
'raise'.overwrite (bool, optional) – Whether to overwrite existing data when reaping.
- property all_nan_result¶
Get a stand-in result for cases which are missing still.
- sow_combos(combos, cases=None, constants=None, shuffle=False, verbosity=1, desc='Sow', batchsize=None, num_batches=None)[source]¶
Sow combos to disk to be later grown, potentially in batches. Note if you have already sown this Crop, as long as the number of batches hasn’t changed (e.g. you have just tweaked the function or a constant argument), you can safely resow and only the batches will be overwritten, i.e. the results will remain.
- Parameters:
combos (dict_like[str, iterable]) – The combinations to sow for all or some function arguments.
cases (iterable or mappings, optional) – Optionally provide a sequence of individual cases to sow for some or all function arguments.
constants (mapping, optional) – Provide additional constant function values to use when sowing.
shuffle (bool or int, optional) – If given, sow the combos in a random order (using
random.seedandrandom.shuffle), which can be helpful for distributing resources when not all cases are computationally equal.verbosity (int, optional) – How much information to show when sowing. 0: no output, 1: progress bar, 2: progress bar with each setting being sown.
desc (str, optional) – Description to show in the progress bar when sowing.
batchsize (int, optional) – If specified, set a new batchsize for the crop.
num_batches (int, optional) – If specified, set a new num_batches for the crop.
- sow_cases(fn_args, cases, combos=None, constants=None, verbosity=1, batchsize=None, num_batches=None)[source]¶
Sow cases to disk to be later grown, potentially in batches.
- Parameters:
fn_args (iterable[str] or str) – The names and order of the function arguments, can be
Noneif each case is supplied as adict.cases (iterable or mappings, optional) – Sequence of individual cases to sow for all or some function arguments.
combos (dict_like[str, iterable]) – Combinations to sow for some or all function arguments.
constants (mapping, optional) – Provide additional constant function values to use when sowing.
verbosity (int, optional) – How much information to show when sowing. 0: no output, 1: progress bar, 2: progress bar with each setting being sown.
batchsize (int, optional) – If specified, set a new batchsize for the crop.
num_batches (int, optional) – If specified, set a new num_batches for the crop.
- grow_subprocess(batch_ids=None, num_workers=None, num_threads=None, gpus=None, affinities=None, raise_errors=False, log=False, min_wait=1e-06, max_wait=0.1, verbosity=1, verbosity_grow=0, desc='Grow')[source]¶
Grow particular or missing batches using a single fresh subprocess per batch. This has a higher overhead for starting each process, but is more robust memory wise, and allows controlling the number of threads used, CPU affinity and GPU assignment.
- Parameters:
batch_ids (int or sequence of int, optional) – Which batch numbers to grow, defaults to all missing.
num_workers (int, optional) – The maximum number of concurrent subprocesses (default 1).
num_threads (int, optional) – The number of threads per subprocess (default 1).
gpus (int, str, or sequence of int, optional) – GPU device IDs to assign to subprocesses via
CUDA_VISIBLE_DEVICES. Each subprocess gets a single GPU from this pool; the pool also limits concurrency to the number of GPUs provided. You can oversubscribe GPUs by repeating device IDs, e.g.0,0,1,1to allow 2 subprocesses to share each GPU.affinities (int, str, or sequence of int, optional) – CPU core IDs to pin subprocesses to via
taskset. Also limits concurrency to the number of affinities.raise_errors (bool, optional) – Whether to raise errors encountered during growing.
log (bool, optional) – Whether to save subprocess stdout and stderr to log files in the crop directory under
logs/batch-{batch_id}.log. Default is False, which discards stdout and only prints stderr on error.min_wait (float, optional) – Minimum polling interval in seconds.
max_wait (float, optional) – Maximum polling interval in seconds.
verbosity (int, optional) – How much information to show when growing. 0: no output, 1: progress bar, 2: progress bar with each setting being grown.
verbosity_grow (int, optional) – Verbosity within each batch grow.
desc (str, optional) – Description to show in the progress bar when sowing.
- grow(batch_ids=None, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, raise_errors=False, debugging=False, verbosity=1, verbosity_grow=0, log=False, desc='Grow', **combo_runner_opts)[source]¶
Grow specific batch numbers using this process.
- Parameters:
batch_ids (int or sequence of ints, optional) – Which batch numbers to grow, by default all missing results.
subprocess ("auto" or bool, optional) – Whether to grow each batch in a fresh subprocess. This adds about 1 second of overhead per batch, but allows the number of threads, cpu affinity and gpu assignment to be controlled. If “auto” (default) then subprocesses will be used if
num_threads,gpusoraffinitiesare specified. SeeCrop.grow_subprocess()for details.num_workers (int, optional) – Maximum number of batches to run concurrently. In subprocess mode this is the cap on simultaneous subprocesses (defaults to 1 if not given). In in-process mode this is the size of the joblib loky process pool used by
combo_runner_core(None= serial).num_threads (int, optional) – Number of threads each worker is allowed to use, applied via the standard env vars (
OMP_NUM_THREADS,MKL_NUM_THREADS,OPENBLAS_NUM_THREADS, …). Only meaningful in subprocess mode (the env vars must be set before numerical libraries are imported); setting it impliessubprocess=Truewhensubprocess="auto". Passing this withsubprocess=FalseraisesValueError.gpus (int, str, or sequence of int, optional) – GPU device IDs to assign to subprocesses via
CUDA_VISIBLE_DEVICES. Each subprocess gets a single GPU from this pool; the pool also caps concurrency to its size. Repeat IDs to oversubscribe (e.g."0,0,1,1"shares each GPU between two workers). Subprocess-mode only — impliessubprocess=Truewhensubprocess="auto"; raisesValueErrorwithsubprocess=False.affinities (int, str, or sequence of int, optional) – CPU core IDs to pin subprocesses to via
taskset. Each subprocess gets one affinity from the pool, which also caps concurrency. Subprocess-mode only — impliessubprocess=Truewhensubprocess="auto"; raisesValueErrorwithsubprocess=False.raise_errors (bool, optional) – Whether to raise errors if they occur during growing.
debugging (bool, optional) – Whether to set the logging level to debug.
verbosity (int, optional) – How much information to show when growing. 0: no output, 1: progress bar, 2: progress bar with each setting being grown.
verbosity_grow (int, optional) – How much information to show when growing each batch.
log (bool, optional) – Whether to save subprocess output to log files. Only used when
subprocess=True.desc (str, optional) – Description to show in the progress bar when growing.
**combo_runner_opts – Additional options forwarded to either
Crop.grow_subprocess()(min_wait,max_wait, …) whensubprocessis True, or tocombo_runner_core(executor,parallel, …) whensubprocessis False.
- reap_combos(wait=False, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap')[source]¶
Reap already sown and grown results from this crop.
- Parameters:
wait (bool, optional) – Whether to wait for results to appear. If false (default) all results need to be in place before the reap.
clean_up (bool, optional) – Whether to delete all the batch files once the results have been gathered. If left as
Nonethis will be automatically set tonot allow_incomplete.allow_incomplete (bool, optional) – Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan.
verbosity (int, optional) – How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped.
desc (str, optional) – Description to show in the progress bar when reaping.
- Returns:
results – ‘N-dimensional’ tuple containing the results.
- Return type:
nested tuple
- reap_combos_to_ds(var_names=None, var_dims=None, var_coords=None, constants=None, attrs=None, parse=True, wait=False, clean_up=None, allow_incomplete=False, to_df=False, verbosity=1, desc='Reap')[source]¶
Reap a function over sowed combinations and output to a Dataset.
- Parameters:
var_names (str, sequence of strings, or None) – Variable name(s) of the output(s) of fn, set to None if fn outputs data already labeled in a Dataset or DataArray.
var_dims (sequence of either strings or string sequences, optional) – ‘Internal’ names of dimensions for each variable, the values for each dimension should be contained as a mapping in either var_coords (not needed by fn) or constants (needed by fn).
var_coords (mapping, optional) – Mapping of extra coords the output variables may depend on.
constants (mapping, optional) – Arguments to fn which are not iterated over, these will be recorded either as attributes or coordinates if they are named in var_dims.
resources (mapping, optional) – Like constants but they will not be recorded.
attrs (mapping, optional) – Any extra attributes to store.
wait (bool, optional) – Whether to wait for results to appear. If false (default) all results need to be in place before the reap.
clean_up (bool, optional) – Whether to delete all the batch files once the results have been gathered. If left as
Nonethis will be automatically set tonot allow_incomplete.allow_incomplete (bool, optional) – Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan.
to_df (bool, optional) – Whether to reap to a
xarray.Datasetor apandas.DataFrame.verbosity (int, optional) – How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped.
desc (str, optional) – Description to show in the progress bar when reaping.
- Returns:
Multidimensional labeled dataset containing all the results.
- Return type:
xarray.Dataset or pandas.Dataframe
- reap_runner(runner, wait=False, clean_up=None, allow_incomplete=False, to_df=False, verbosity=1, desc='Reap', **kwargs)[source]¶
Reap a Crop over sowed combos and save to a dataset defined by a
Runner.
- reap_harvest(harvester, wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap', **kwargs)[source]¶
Reap a Crop over sowed combos and merge with the dataset defined by a
Harvester.
- reap_samples(sampler, wait=False, sync=True, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap', **kwargs)[source]¶
Reap a Crop over sowed combos and merge with the dataframe defined by a
Sampler.
- reap(wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap')[source]¶
Reap sown and grown combos from disk. Return a dataset if a runner or harvester is set, otherwise, the raw nested tuple.
- Parameters:
wait (bool, optional) – Whether to wait for results to appear. If false (default) all results need to be in place before the reap.
sync (bool, optional) – Immediately sync the new dataset with the on-disk full dataset or dataframe if a harvester or sampler is used.
overwrite (bool, optional) – How to compare data when syncing to on-disk dataset. If
None, (default) merge as long as no conflicts.True: overwrite with the new data.False, discard any new conflicting data.clean_up (bool, optional) – Whether to delete all the batch files once the results have been gathered. If left as
Nonethis will be automatically set tonot allow_incomplete.allow_incomplete (bool, optional) – Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan.
verbosity (int, optional) – How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped.
desc (str, optional) – Description to show in the progress bar when reaping.
- Return type:
nested tuple or xarray.Dataset
- check_bad(delete_bad=True)[source]¶
Check that the result dumps are not bad -> sometimes length does not match the batch. Optionally delete these so that they can be re-grown.
- fn¶
- property num_sown_batches¶
Total number of batches to be run/grown.
- property num_results¶
- xyzpy.gen.cropping.load_crops(directory='.')[source]¶
Automatically load all the crops found in the current directory.
- class xyzpy.gen.cropping.Sower(crop)[source]¶
Bases:
objectClass for sowing a ‘crop’ of batched combos to then ‘grow’ (on any number of workers sharing the filesystem) and then reap.
- crop¶
- _batch_cases = []¶
- _counter = 0¶
- _batch_counter = 0¶
- xyzpy.gen.cropping.grow(batch_number, crop=None, fn=None, num_workers=None, check_mpi=True, verbosity=2, debugging=False, raise_errors=True)[source]¶
Automatically process a batch of cases into results. Should be run in an “.xyz-{fn_name}” folder, or crop should be specified.
- Parameters:
batch_number (int) – Which batch to ‘grow’ into a set of results.
crop (xyzpy.Crop) – Description of where and how to store the cases and results.
fn (callable, optional) – If specified, the function used to generate the results, otherwise the function will be loaded from disk.
num_workers (int, optional) – If specified, grow using a pool of this many workers. This uses
joblib.externals.lokyto spawn processes.check_mpi (bool, optional) – Whether to check if the process is rank 0 and only save results if so - allows mpi functions to be simply used. Defaults to true, this should only be turned off if e.g. a pool of workers is being used to run different
growinstances.verbosity ({0, 1, 2}, optional) – How much information to show.
debugging (bool, optional) – Set logging level to DEBUG.
raise_errors (bool, optional) – Whether to raise errors that occur during the computation. If growing many batches in parallel, it can be useful to set this to False so a single error doesn’t crash the whole process.
- class xyzpy.gen.cropping.Reaper(crop, num_batches, wait=False, default_result=None)[source]¶
Bases:
objectClass that acts as a stateful function to retrieve already sown and grow results.
- crop¶
- results¶
- xyzpy.gen.cropping._SGE_HEADER = Multiline-String¶
Show Value
"""#!/bin/bash -l #$ -S /bin/bash #$ -N {name} #$ -l h_rt={hours}:{minutes}:{seconds},mem={gigabytes}G #$ -l tmpfs={temp_gigabytes}G mkdir -p {output_directory} #$ -wd {output_directory} #$ -pe {pe} {num_procs} {header_options} """
- xyzpy.gen.cropping._SGE_ARRAY_HEADER = Multiline-String¶
Show Value
"""#$ -t {run_start}-{run_stop} """
- xyzpy.gen.cropping._PBS_HEADER = Multiline-String¶
Show Value
"""#!/bin/bash -l #PBS -N {name} #PBS -lselect={num_nodes}:ncpus={num_procs}:mem={gigabytes}gb #PBS -lwalltime={hours:02}:{minutes:02}:{seconds:02} {header_options} """
- xyzpy.gen.cropping._PBS_ARRAY_HEADER = Multiline-String¶
Show Value
"""#PBS -J {run_start}-{run_stop} """
- xyzpy.gen.cropping._SLURM_HEADER = Multiline-String¶
Show Value
"""#!/bin/bash -l #SBATCH --job-name={name} #SBATCH --time={hours:02}:{minutes:02}:{seconds:02} {header_options} """
- xyzpy.gen.cropping._SLURM_ARRAY_HEADER = Multiline-String¶
Show Value
"""#SBATCH --array={run_start}-{run_stop} """
- xyzpy.gen.cropping._BASE = Multiline-String¶
Show Value
"""echo 'XYZPY script starting...' cd {working_directory} export OMP_NUM_THREADS={num_threads} export MKL_NUM_THREADS={num_threads} export OPENBLAS_NUM_THREADS={num_threads} export NUMBA_NUM_THREADS={num_threads} {shell_setup} read -r -d '' SCRIPT << EOM {setup} from xyzpy.gen.cropping import grow, Crop if __name__ == '__main__': crop = Crop(name='{name}', parent_dir='{parent_dir}') print('Growing:', repr(crop)) grow_kwargs = dict( num_workers={num_workers}, subprocess={subprocess}, debugging={debugging}, verbosity_grow=2, ) """
- xyzpy.gen.cropping._CLUSTER_SGE_GROW_ALL_SCRIPT = Multiline-String¶
Show Value
""" crop.grow($SGE_TASK_ID, **grow_kwargs) """
- xyzpy.gen.cropping._CLUSTER_PBS_GROW_ALL_SCRIPT = Multiline-String¶
Show Value
""" crop.grow($PBS_ARRAY_INDEX, **grow_kwargs) """
- xyzpy.gen.cropping._CLUSTER_SLURM_GROW_ALL_SCRIPT = Multiline-String¶
Show Value
""" crop.grow($SLURM_ARRAY_TASK_ID, **grow_kwargs) """
- xyzpy.gen.cropping._CLUSTER_SGE_GROW_PARTIAL_SCRIPT = Multiline-String¶
Show Value
""" batch_ids = {batch_ids}] crop.grow(batch_ids[$SGE_TASK_ID - 1], **grow_kwargs) """
- xyzpy.gen.cropping._CLUSTER_PBS_GROW_PARTIAL_SCRIPT = Multiline-String¶
Show Value
""" batch_ids = {batch_ids} crop.grow(batch_ids[$PBS_ARRAY_INDEX - 1], **grow_kwargs) """
- xyzpy.gen.cropping._CLUSTER_SLURM_GROW_PARTIAL_SCRIPT = Multiline-String¶
Show Value
""" batch_ids = {batch_ids} crop.grow(batch_ids[$SLURM_ARRAY_TASK_ID - 1], **grow_kwargs) """
- xyzpy.gen.cropping._BASE_CLUSTER_GROW_SINGLE = Multiline-String¶
Show Value
""" grow_kwargs['verbosity_grow'] = 0 batch_ids = {batch_ids} crop.grow(batch_ids, **grow_kwargs) """
- xyzpy.gen.cropping._BASE_CLUSTER_SCRIPT_END = Multiline-String¶
Show Value
"""EOM {launcher} -c "$SCRIPT" echo 'XYZPY script finished' """
- xyzpy.gen.cropping.gen_cluster_script(crop, scheduler, batch_ids=None, *, mode='array', num_procs=None, num_threads=None, num_nodes=None, num_workers=None, subprocess=False, mem=None, mem_per_cpu=None, gigabytes=None, time=None, hours=None, minutes=None, seconds=None, conda_env=True, launcher=None, setup='#', shell_setup='', mpi=False, temp_gigabytes=1, output_directory=None, debugging=False, **kwargs)[source]¶
Generate a cluster script to grow a Crop.
- Parameters:
crop (Crop) – The crop to grow.
scheduler ({'sge', 'pbs', 'slurm'}) – Whether to use a SGE, PBS or slurm submission script template.
batch_ids (int or tuple[int]) – Which batch numbers to grow, defaults to all missing batches.
mode ({'array', 'single'}) – How to distribute the batches, either as an array job with a single batch per job, or as a single job processing batches in parallel.
hours (int) – How many hours to request, default=0.
minutes (int, optional) – How many minutes to request, default=20.
seconds (int, optional) – How many seconds to request, default=0.
gigabytes (int, optional) – How much memory to request, default: 2.
num_procs (int, optional) – How many processes to request (threaded cores or MPI), default: 1.
num_threads (int, optional) – How many threads to use per process. Will be computed automatically based on
num_procsandnum_workersif not specified.num_workers (int, optional) – How many workers to use for parallel growing, default is sequential. If specified, then generally
num_workers * num_threads == num_procs.subprocess (bool, optional) – Whether to use a fresh subprocess for each batch, default: False.
num_nodes (int, optional) – How many nodes to request, default: 1.
conda_env (bool or str, optional) – Whether to activate a conda environment before running the script. If
True, the environment will be the same as the one used to launch the script. If a string, the environment will be the one specified by the string.launcher (str, optional) – How to launch the script, default: the current Python interpreter. But could for example be
'mpiexec python'for an MPI program.setup (str, optional) – Python script to run before growing, for things that shouldn’t be put in the crop function itself, e.g. one-time imports with side-effects like:
"import tensorflow as tf; tf.enable_eager_execution()”.shell_setup (str, optional) – Commands to be run by the shell before the python script is executed.
mpi (bool, optional) – Request MPI processes not threaded processes
temp_gigabytes (int, optional) – How much temporary on-disk memory.
output_directory (str, optional) – What directory to write output to. Defaults to “$HOME/Scratch/output”.
debugging (bool, optional) – Set the python log level to debugging.
kwargs (dict, optional) – Extra keyword arguments are taken to be extra resources to request in the header of the submission script, e.g.
{'gpu': 1}will add"#SBATCH --gpu=1"to the header if using slurm. If you supply literalTrueorNoneas the value, then the key will be treated as a flag. E.g.{'requeue': None}will add"#SBATCH --requeue"to the header.
- Return type:
- xyzpy.gen.cropping.grow_cluster(crop, scheduler, batch_ids=None, *, hours=None, minutes=None, seconds=None, gigabytes=2, num_nodes=1, num_procs=1, num_threads=None, num_workers=None, subprocess=False, conda_env=True, launcher=None, setup='#', shell_setup='', mpi=False, temp_gigabytes=1, output_directory=None, debugging=False, **kwargs)[source]¶
Automagically submit SGE, PBS, or slurm jobs to grow all missing results.
- Parameters:
crop (Crop) – The crop to grow.
scheduler ({'sge', 'pbs', 'slurm'}) – Whether to use a SGE, PBS or slurm submission script template.
batch_ids (int or tuple[int]) – Which batch numbers to grow, defaults to all missing batches.
hours (int) – How many hours to request, default=0.
minutes (int, optional) – How many minutes to request, default=20.
seconds (int, optional) – How many seconds to request, default=0.
gigabytes (int, optional) – How much memory to request, default: 2.
num_nodes (int, optional) – How many nodes to request, default: 1.
num_procs (int, optional) – How many processes to request (threaded cores or MPI), default: 1.
num_threads (int, optional) – How many threads to use per process. Will be computed automatically based on
num_procsandnum_workersif not specified.num_workers (int, optional) – How many workers to use for parallel growing, default is sequential. If specified, then generally
num_workers * num_threads == num_procs.subprocess (bool, optional) – Whether to use a fresh subprocess for each batch, default: False.
conda_env (bool or str, optional) – Whether to activate a conda environment before running the script. If
True, the environment will be the same as the one used to launch the script. If a string, the environment will be the one specified by the string.launcher (str, optional) – How to launch the script, default: the current Python interpreter. But could for example be
'mpiexec python'for a MPI program.setup (str, optional) – Python script to run before growing, for things that shouldnt’t be put in the crop function itself, e.g. one-time imports with side-effects like:
"import tensorflow as tf; tf.enable_eager_execution()”.shell_setup (str, optional) – Commands to be run by the shell before the python script is executed. E.g.
conda activate my_env.mpi (bool, optional) – Request MPI processes not threaded processes.
temp_gigabytes (int, optional) – How much temporary on-disk memory.
output_directory (str, optional) – What directory to write output to. Defaults to “$HOME/Scratch/output”.
debugging (bool, optional) – Set the python log level to debugging.
- xyzpy.gen.cropping.gen_qsub_script(crop, batch_ids=None, *, scheduler='sge', **kwargs)[source]¶
Generate a qsub script to grow a Crop. Deprecated in favor of gen_cluster_script and will be removed in the future.