xyzpy.gen.cropping ================== .. py:module:: xyzpy.gen.cropping Attributes ---------- .. autoapisummary:: xyzpy.gen.cropping.BTCH_NM xyzpy.gen.cropping.RSLT_NM xyzpy.gen.cropping.FNCT_NM xyzpy.gen.cropping.INFO_NM xyzpy.gen.cropping._SGE_HEADER xyzpy.gen.cropping._SGE_ARRAY_HEADER xyzpy.gen.cropping._PBS_HEADER xyzpy.gen.cropping._PBS_ARRAY_HEADER xyzpy.gen.cropping._SLURM_HEADER xyzpy.gen.cropping._SLURM_ARRAY_HEADER xyzpy.gen.cropping._BASE xyzpy.gen.cropping._CLUSTER_SGE_GROW_ALL_SCRIPT xyzpy.gen.cropping._CLUSTER_PBS_GROW_ALL_SCRIPT xyzpy.gen.cropping._CLUSTER_SLURM_GROW_ALL_SCRIPT xyzpy.gen.cropping._CLUSTER_SGE_GROW_PARTIAL_SCRIPT xyzpy.gen.cropping._CLUSTER_PBS_GROW_PARTIAL_SCRIPT xyzpy.gen.cropping._CLUSTER_SLURM_GROW_PARTIAL_SCRIPT xyzpy.gen.cropping._BASE_CLUSTER_GROW_SINGLE xyzpy.gen.cropping._BASE_CLUSTER_SCRIPT_END Classes ------- .. autoapisummary:: xyzpy.gen.cropping._ResourcePool xyzpy.gen.cropping.Crop xyzpy.gen.cropping.Sower xyzpy.gen.cropping.Reaper Functions --------- .. autoapisummary:: xyzpy.gen.cropping.write_to_disk xyzpy.gen.cropping.read_from_disk xyzpy.gen.cropping.get_picklelib xyzpy.gen.cropping.to_pickle xyzpy.gen.cropping.from_pickle xyzpy.gen.cropping.parse_crop_details xyzpy.gen.cropping.parse_fn_farmer xyzpy.gen.cropping.calc_clean_up_default_res xyzpy.gen.cropping.check_ready_to_reap xyzpy.gen.cropping._parse_resource_ids xyzpy.gen.cropping._acquire_affinity xyzpy.gen.cropping._acquire_gpu xyzpy.gen.cropping.load_crops xyzpy.gen.cropping.grow xyzpy.gen.cropping.gen_cluster_script xyzpy.gen.cropping.grow_cluster xyzpy.gen.cropping.gen_qsub_script xyzpy.gen.cropping.qsub_grow xyzpy.gen.cropping.clean_slurm_outputs xyzpy.gen.cropping.manage_slurm_outputs Module Contents --------------- .. py:data:: BTCH_NM :value: 'xyz-batch-{}.jbdmp' .. py:data:: RSLT_NM :value: 'xyz-result-{}.jbdmp' .. py:data:: FNCT_NM :value: 'xyz-function.clpkl' .. py:data:: INFO_NM :value: 'xyz-settings.jbdmp' .. py:function:: write_to_disk(obj, fname) .. py:function:: read_from_disk(fname) .. py:function:: get_picklelib(picklelib='joblib.externals.cloudpickle') .. py:function:: to_pickle(obj, picklelib='joblib.externals.cloudpickle') .. py:function:: from_pickle(s, picklelib='joblib.externals.cloudpickle') .. py:function:: parse_crop_details(fn, crop_name, crop_parent) Work out how to structure the sowed data. :param fn: Function to infer name crop_name from, if not given. :type fn: callable, optional :param crop_name: Specific name to give this set of runs. :type crop_name: str, optional :param crop_parent: Specific directory to put the ".xyz-{crop_name}/" folder in with all the cases and results. :type crop_parent: str, optional :returns: * **crop_location** (*str*) -- Full path to the crop-folder. * **crop_name** (*str*) -- Name of the crop. * **crop_parent** (*str*) -- Parent folder of the crop. .. py:function:: parse_fn_farmer(fn, farmer) .. py:function:: calc_clean_up_default_res(crop, clean_up, allow_incomplete) Logic for choosing whether to automatically clean up a crop, and what, if any, the default all-nan result should be. .. py:function:: check_ready_to_reap(crop, allow_incomplete, wait) .. py:function:: _parse_resource_ids(raw) Normalize an int, list, tuple, range, or comma-separated string into a list of integer resource IDs. .. py:function:: _acquire_affinity(rid, pargs, env) Prepend ``taskset -c `` to pin to a CPU core. .. py:function:: _acquire_gpu(rid, pargs, env) Set ``CUDA_VISIBLE_DEVICES`` to pin to a GPU. .. py:class:: _ResourcePool(ids, acquire_fn) A pool of reusable resource IDs (CPUs, GPUs, etc.) that can be acquired and released once per batch subprocess. :param ids: The resource IDs available to hand out. :type ids: list of int :param acquire_fn: ``fn(rid, pargs, env)`` — mutate *pargs* (the command prefix list) and/or *env* (the environment dict) to apply *rid*. :type acquire_fn: callable .. py:attribute:: free .. py:attribute:: used .. py:attribute:: acquire_fn .. py:method:: from_raw(raw, acquire_fn) :classmethod: Create a pool from a raw user value, or return ``None``. .. py:method:: available() Whether there is at least one free resource. .. py:method:: acquire(batch_id, pargs, env) Pop a resource, apply it, and track it against *batch_id*. .. py:method:: release(batch_id) Return the resource used by *batch_id* to the free pool. .. py:class:: Crop(*, fn=None, name=None, parent_dir=None, save_fn=None, batchsize=None, num_batches=None, shuffle=False, farmer=None, autoload=True) Bases: :py:obj:`object` Encapsulates all the details describing a single 'crop', that is, its location, name, and batch size/number. Also allows tracking of crop's progress, and experimentally, automatic submission of workers to grid engine to complete un-grown cases. Can also be instantiated directly from a :class:`~xyzpy.Runner` or :class:`~xyzpy.Harvester` or :class:`~Sampler.Crop` instance. :param fn: Target function - Crop `name` will be inferred from this if not given explicitly. If given, `Sower` will also default to saving a version of `fn` to disk for `cropping.grow` to use. :type fn: callable, optional :param name: Custom name for this set of runs - must be given if `fn` is not. :type name: str, optional :param parent_dir: If given, alternative directory to put the ".xyz-{name}/" folder in with all the cases and results. :type parent_dir: str, optional :param save_fn: Whether to save the function to disk for `cropping.grow` to use. Will default to True if `fn` is given. :type save_fn: bool, optional :param batchsize: How many cases to group into a single batch per worker. By default, batchsize=1. Cannot be specified if `num_batches` is. :type batchsize: int, optional :param num_batches: How many total batches to aim for, cannot be specified if `batchsize` is. :type num_batches: int, optional :param farmer: A Runner, Harvester or Sampler, instance, from which the `fn` can be inferred and which can also allow the Crop to reap itself straight to a dataset or dataframe. :type farmer: {xyzpy.Runner, xyzpy.Harvester, xyzpy.Sampler}, optional :param autoload: If True, check for the existence of a Crop written to disk with the same location, and if found, load it. :type autoload: bool, optional .. seealso:: :py:obj:`Runner.Crop`, :py:obj:`Harvester.Crop`, :py:obj:`Sampler.Crop` .. py:attribute:: name :value: None .. py:attribute:: parent_dir :value: None .. py:attribute:: save_fn :value: None .. py:attribute:: batchsize :value: None .. py:attribute:: num_batches :value: None .. py:attribute:: shuffle :value: False .. py:attribute:: _batch_remainder :value: None .. py:attribute:: _all_nan_result :value: None .. py:attribute:: _num_sown_batches :value: -1 .. py:attribute:: _num_results :value: -1 .. py:property:: runner .. py:method:: choose_batch_settings(*, combos=None, cases=None) Work out how to divide all cases into batches, i.e. ensure that ``batchsize * num_batches >= num_cases``. .. py:method:: ensure_dirs_exists() Make sure the directory structure for this crop exists. .. py:method:: save_info(combos=None, cases=None, fn_args=None) Save information about the sowed cases. .. py:method:: load_info() Load the full settings from disk. .. py:method:: load_batch(batch_number) Load a specific batch from disk. .. py:method:: load_result(batch_number) Load a specific result from disk. .. py:method:: save_result(batch_number, result) Save a specific result to disk. .. py:method:: _sync_info_from_disk(only_missing=True) Load information about the saved cases. .. py:method:: save_function_to_disk() Save the base function to disk using cloudpickle .. py:method:: load_function() Load the saved function from disk, and try to re-insert it back into Harvester or Runner if present. .. py:method:: prepare(combos=None, cases=None, fn_args=None) Write information about this crop and the supplied combos to disk. Typically done at start of sow, not when Crop instantiated. .. py:method:: is_prepared() Check whether this crop has been written to disk. .. py:method:: calc_progress() Calculate how much progressed has been made in growing the batches. .. py:method:: is_ready_to_reap() Have all batches been grown? .. py:method:: completed_results() -> tuple[int, Ellipsis] Return tuple of batches which have been grown already. .. py:method:: missing_results() -> tuple[int, Ellipsis] Return tuple of batches which haven't been grown yet. .. py:method:: delete_all() Delete the crop directory and all its contents, and reset any loaded information on this Crop object. .. py:method:: handle_existing(action='ask', msg=None, e=None, overwrite=False) Handle an already prepared crop. :param action: What to do with the existing crop. If ``'ask'`` (default), interactively prompt the user. Otherwise, execute the specified action directly. :type action: {'ask', 'reap', 'delete', 'skip', 'raise'} :param msg: Message to display when prompting. :type msg: str, optional :param e: Exception to re-raise if action is ``'raise'``. :type e: Exception, optional :param overwrite: Whether to overwrite existing data when reaping. :type overwrite: bool, optional .. py:property:: all_nan_result Get a stand-in result for cases which are missing still. .. py:method:: __str__() .. py:method:: __repr__() .. py:method:: parse_constants(constants=None) .. py:method:: sow_combos(combos, cases=None, constants=None, shuffle=False, verbosity=1, desc='Sow', batchsize=None, num_batches=None) Sow combos to disk to be later grown, potentially in batches. Note if you have already sown this `Crop`, as long as the number of batches hasn't changed (e.g. you have just tweaked the function or a constant argument), you can safely resow and only the batches will be overwritten, i.e. the results will remain. :param combos: The combinations to sow for all or some function arguments. :type combos: dict_like[str, iterable] :param cases: Optionally provide a sequence of individual cases to sow for some or all function arguments. :type cases: iterable or mappings, optional :param constants: Provide additional constant function values to use when sowing. :type constants: mapping, optional :param shuffle: If given, sow the combos in a random order (using ``random.seed`` and ``random.shuffle``), which can be helpful for distributing resources when not all cases are computationally equal. :type shuffle: bool or int, optional :param verbosity: How much information to show when sowing. 0: no output, 1: progress bar, 2: progress bar with each setting being sown. :type verbosity: int, optional :param desc: Description to show in the progress bar when sowing. :type desc: str, optional :param batchsize: If specified, set a new batchsize for the crop. :type batchsize: int, optional :param num_batches: If specified, set a new num_batches for the crop. :type num_batches: int, optional .. py:method:: sow_cases(fn_args, cases, combos=None, constants=None, verbosity=1, batchsize=None, num_batches=None) Sow cases to disk to be later grown, potentially in batches. :param fn_args: The names and order of the function arguments, can be ``None`` if each case is supplied as a ``dict``. :type fn_args: iterable[str] or str :param cases: Sequence of individual cases to sow for all or some function arguments. :type cases: iterable or mappings, optional :param combos: Combinations to sow for some or all function arguments. :type combos: dict_like[str, iterable] :param constants: Provide additional constant function values to use when sowing. :type constants: mapping, optional :param verbosity: How much information to show when sowing. 0: no output, 1: progress bar, 2: progress bar with each setting being sown. :type verbosity: int, optional :param batchsize: If specified, set a new batchsize for the crop. :type batchsize: int, optional :param num_batches: If specified, set a new num_batches for the crop. :type num_batches: int, optional .. py:method:: sow_samples(n, combos=None, constants=None, verbosity=1) Sow ``n`` samples to disk. .. py:method:: grow_subprocess(batch_ids=None, num_workers=None, num_threads=None, gpus=None, affinities=None, raise_errors=False, log=False, min_wait=1e-06, max_wait=0.1, verbosity=1, verbosity_grow=0, desc='Grow') Grow particular or missing batches using a single fresh subprocess per batch. This has a higher overhead for starting each process, but is more robust memory wise, and allows controlling the number of threads used, CPU affinity and GPU assignment. :param batch_ids: Which batch numbers to grow, defaults to all missing. :type batch_ids: int or sequence of int, optional :param num_workers: The maximum number of concurrent subprocesses (default 1). :type num_workers: int, optional :param num_threads: The number of threads per subprocess (default 1). :type num_threads: int, optional :param gpus: GPU device IDs to assign to subprocesses via ``CUDA_VISIBLE_DEVICES``. Each subprocess gets a single GPU from this pool; the pool also limits concurrency to the number of GPUs provided. You can oversubscribe GPUs by repeating device IDs, e.g. ``0,0,1,1`` to allow 2 subprocesses to share each GPU. :type gpus: int, str, or sequence of int, optional :param affinities: CPU core IDs to pin subprocesses to via ``taskset``. Also limits concurrency to the number of affinities. :type affinities: int, str, or sequence of int, optional :param raise_errors: Whether to raise errors encountered during growing. :type raise_errors: bool, optional :param log: Whether to save subprocess stdout and stderr to log files in the crop directory under ``logs/batch-{batch_id}.log``. Default is False, which discards stdout and only prints stderr on error. :type log: bool, optional :param min_wait: Minimum polling interval in seconds. :type min_wait: float, optional :param max_wait: Maximum polling interval in seconds. :type max_wait: float, optional :param verbosity: How much information to show when growing. 0: no output, 1: progress bar, 2: progress bar with each setting being grown. :type verbosity: int, optional :param verbosity_grow: Verbosity within each batch grow. :type verbosity_grow: int, optional :param desc: Description to show in the progress bar when sowing. :type desc: str, optional .. py:method:: grow(batch_ids=None, subprocess='auto', num_workers=None, num_threads=None, gpus=None, affinities=None, raise_errors=False, debugging=False, verbosity=1, verbosity_grow=0, log=False, desc='Grow', **combo_runner_opts) Grow specific batch numbers using this process. :param batch_ids: Which batch numbers to grow, by default all missing results. :type batch_ids: int or sequence of ints, optional :param subprocess: Whether to grow each batch in a fresh subprocess. This adds about 1 second of overhead per batch, but allows the number of threads, cpu affinity and gpu assignment to be controlled. If "auto" (default) then subprocesses will be used if ``num_threads``, ``gpus`` or ``affinities`` are specified. See :meth:`Crop.grow_subprocess` for details. :type subprocess: "auto" or bool, optional :param num_workers: Maximum number of batches to run concurrently. In subprocess mode this is the cap on simultaneous subprocesses (defaults to 1 if not given). In in-process mode this is the size of the joblib loky process pool used by ``combo_runner_core`` (``None`` = serial). :type num_workers: int, optional :param num_threads: Number of threads each worker is allowed to use, applied via the standard env vars (``OMP_NUM_THREADS``, ``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS``, ...). Only meaningful in subprocess mode (the env vars must be set before numerical libraries are imported); setting it implies ``subprocess=True`` when ``subprocess="auto"``. Passing this with ``subprocess=False`` raises ``ValueError``. :type num_threads: int, optional :param gpus: GPU device IDs to assign to subprocesses via ``CUDA_VISIBLE_DEVICES``. Each subprocess gets a single GPU from this pool; the pool also caps concurrency to its size. Repeat IDs to oversubscribe (e.g. ``"0,0,1,1"`` shares each GPU between two workers). Subprocess-mode only — implies ``subprocess=True`` when ``subprocess="auto"``; raises ``ValueError`` with ``subprocess=False``. :type gpus: int, str, or sequence of int, optional :param affinities: CPU core IDs to pin subprocesses to via ``taskset``. Each subprocess gets one affinity from the pool, which also caps concurrency. Subprocess-mode only — implies ``subprocess=True`` when ``subprocess="auto"``; raises ``ValueError`` with ``subprocess=False``. :type affinities: int, str, or sequence of int, optional :param raise_errors: Whether to raise errors if they occur during growing. :type raise_errors: bool, optional :param debugging: Whether to set the logging level to debug. :type debugging: bool, optional :param verbosity: How much information to show when growing. 0: no output, 1: progress bar, 2: progress bar with each setting being grown. :type verbosity: int, optional :param verbosity_grow: How much information to show when growing each batch. :type verbosity_grow: int, optional :param log: Whether to save subprocess output to log files. Only used when ``subprocess=True``. :type log: bool, optional :param desc: Description to show in the progress bar when growing. :type desc: str, optional :param \*\*combo_runner_opts: Additional options forwarded to either :meth:`Crop.grow_subprocess` (``min_wait``, ``max_wait``, ...) when ``subprocess`` is True, or to ``combo_runner_core`` (``executor``, ``parallel``, ...) when ``subprocess`` is False. .. py:method:: grow_missing(**combo_runner_opts) Grow any missing results using this process. .. py:method:: reap_combos(wait=False, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap') Reap already sown and grown results from this crop. :param wait: Whether to wait for results to appear. If false (default) all results need to be in place before the reap. :type wait: bool, optional :param clean_up: Whether to delete all the batch files once the results have been gathered. If left as ``None`` this will be automatically set to ``not allow_incomplete``. :type clean_up: bool, optional :param allow_incomplete: Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan. :type allow_incomplete: bool, optional :param verbosity: How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped. :type verbosity: int, optional :param desc: Description to show in the progress bar when reaping. :type desc: str, optional :returns: **results** -- 'N-dimensional' tuple containing the results. :rtype: nested tuple .. py:method:: reap_combos_to_ds(var_names=None, var_dims=None, var_coords=None, constants=None, attrs=None, parse=True, wait=False, clean_up=None, allow_incomplete=False, to_df=False, verbosity=1, desc='Reap') Reap a function over sowed combinations and output to a Dataset. :param var_names: Variable name(s) of the output(s) of `fn`, set to None if fn outputs data already labeled in a Dataset or DataArray. :type var_names: str, sequence of strings, or None :param var_dims: 'Internal' names of dimensions for each variable, the values for each dimension should be contained as a mapping in either `var_coords` (not needed by `fn`) or `constants` (needed by `fn`). :type var_dims: sequence of either strings or string sequences, optional :param var_coords: Mapping of extra coords the output variables may depend on. :type var_coords: mapping, optional :param constants: Arguments to `fn` which are not iterated over, these will be recorded either as attributes or coordinates if they are named in `var_dims`. :type constants: mapping, optional :param resources: Like `constants` but they will not be recorded. :type resources: mapping, optional :param attrs: Any extra attributes to store. :type attrs: mapping, optional :param wait: Whether to wait for results to appear. If false (default) all results need to be in place before the reap. :type wait: bool, optional :param clean_up: Whether to delete all the batch files once the results have been gathered. If left as ``None`` this will be automatically set to ``not allow_incomplete``. :type clean_up: bool, optional :param allow_incomplete: Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan. :type allow_incomplete: bool, optional :param to_df: Whether to reap to a ``xarray.Dataset`` or a ``pandas.DataFrame``. :type to_df: bool, optional :param verbosity: How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped. :type verbosity: int, optional :param desc: Description to show in the progress bar when reaping. :type desc: str, optional :returns: Multidimensional labeled dataset containing all the results. :rtype: xarray.Dataset or pandas.Dataframe .. py:method:: reap_runner(runner, wait=False, clean_up=None, allow_incomplete=False, to_df=False, verbosity=1, desc='Reap', **kwargs) Reap a Crop over sowed combos and save to a dataset defined by a :class:`~xyzpy.Runner`. .. py:method:: reap_harvest(harvester, wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap', **kwargs) Reap a Crop over sowed combos and merge with the dataset defined by a :class:`~xyzpy.Harvester`. .. py:method:: reap_samples(sampler, wait=False, sync=True, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap', **kwargs) Reap a Crop over sowed combos and merge with the dataframe defined by a :class:`~xyzpy.Sampler`. .. py:method:: reap(wait=False, sync=True, overwrite=None, clean_up=None, allow_incomplete=False, verbosity=1, desc='Reap') Reap sown and grown combos from disk. Return a dataset if a runner or harvester is set, otherwise, the raw nested tuple. :param wait: Whether to wait for results to appear. If false (default) all results need to be in place before the reap. :type wait: bool, optional :param sync: Immediately sync the new dataset with the on-disk full dataset or dataframe if a harvester or sampler is used. :type sync: bool, optional :param overwrite: How to compare data when syncing to on-disk dataset. If ``None``, (default) merge as long as no conflicts. ``True``: overwrite with the new data. ``False``, discard any new conflicting data. :type overwrite: bool, optional :param clean_up: Whether to delete all the batch files once the results have been gathered. If left as ``None`` this will be automatically set to ``not allow_incomplete``. :type clean_up: bool, optional :param allow_incomplete: Allow only partially completed crop results to be reaped, incomplete results will all be filled-in as nan. :type allow_incomplete: bool, optional :param verbosity: How much information to show when reaping. 0: no output, 1: progress bar, 2: progress bar with each setting being reaped. :type verbosity: int, optional :param desc: Description to show in the progress bar when reaping. :type desc: str, optional :rtype: nested tuple or xarray.Dataset .. py:method:: check_bad(delete_bad=True) Check that the result dumps are not bad -> sometimes length does not match the batch. Optionally delete these so that they can be re-grown. :param delete_bad: Delete bad results as they are come across. :type delete_bad: bool :returns: **bad_ids** -- The bad batch numbers. :rtype: tuple .. py:method:: _get_fn() .. py:method:: _set_fn(fn) .. py:method:: _del_fn() .. py:attribute:: fn .. py:property:: num_sown_batches Total number of batches to be run/grown. .. py:property:: num_results .. py:function:: load_crops(directory='.') Automatically load all the crops found in the current directory. :param directory: Which directory to load the crops from, defaults to '.' - the current. :type directory: str, optional :returns: Mapping of the crop name to the Crop. :rtype: dict[str, Crop] .. py:class:: Sower(crop) Bases: :py:obj:`object` Class for sowing a 'crop' of batched combos to then 'grow' (on any number of workers sharing the filesystem) and then reap. .. py:attribute:: crop .. py:attribute:: _batch_cases :value: [] .. py:attribute:: _counter :value: 0 .. py:attribute:: _batch_counter :value: 0 .. py:method:: save_batch() Save the current batch of cases to disk and start the next batch. .. py:method:: __enter__() .. py:method:: __call__(**kwargs) .. py:method:: __exit__(exception_type, exception_value, traceback) .. py:function:: grow(batch_number, crop=None, fn=None, num_workers=None, check_mpi=True, verbosity=2, debugging=False, raise_errors=True) Automatically process a batch of cases into results. Should be run in an ".xyz-{fn_name}" folder, or `crop` should be specified. :param batch_number: Which batch to 'grow' into a set of results. :type batch_number: int :param crop: Description of where and how to store the cases and results. :type crop: xyzpy.Crop :param fn: If specified, the function used to generate the results, otherwise the function will be loaded from disk. :type fn: callable, optional :param num_workers: If specified, grow using a pool of this many workers. This uses ``joblib.externals.loky`` to spawn processes. :type num_workers: int, optional :param check_mpi: Whether to check if the process is rank 0 and only save results if so - allows mpi functions to be simply used. Defaults to true, this should only be turned off if e.g. a pool of workers is being used to run different ``grow`` instances. :type check_mpi: bool, optional :param verbosity: How much information to show. :type verbosity: {0, 1, 2}, optional :param debugging: Set logging level to DEBUG. :type debugging: bool, optional :param raise_errors: Whether to raise errors that occur during the computation. If growing many batches in parallel, it can be useful to set this to False so a single error doesn't crash the whole process. :type raise_errors: bool, optional .. py:class:: Reaper(crop, num_batches, wait=False, default_result=None) Bases: :py:obj:`object` Class that acts as a stateful function to retrieve already sown and grow results. .. py:attribute:: crop .. py:attribute:: results .. py:method:: __call__(**kwargs) .. py:method:: check_finished() .. py:data:: _SGE_HEADER :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """#!/bin/bash -l #$ -S /bin/bash #$ -N {name} #$ -l h_rt={hours}:{minutes}:{seconds},mem={gigabytes}G #$ -l tmpfs={temp_gigabytes}G mkdir -p {output_directory} #$ -wd {output_directory} #$ -pe {pe} {num_procs} {header_options} """ .. raw:: html
.. py:data:: _SGE_ARRAY_HEADER :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """#$ -t {run_start}-{run_stop} """ .. raw:: html
.. py:data:: _PBS_HEADER :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """#!/bin/bash -l #PBS -N {name} #PBS -lselect={num_nodes}:ncpus={num_procs}:mem={gigabytes}gb #PBS -lwalltime={hours:02}:{minutes:02}:{seconds:02} {header_options} """ .. raw:: html
.. py:data:: _PBS_ARRAY_HEADER :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """#PBS -J {run_start}-{run_stop} """ .. raw:: html
.. py:data:: _SLURM_HEADER :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """#!/bin/bash -l #SBATCH --job-name={name} #SBATCH --time={hours:02}:{minutes:02}:{seconds:02} {header_options} """ .. raw:: html
.. py:data:: _SLURM_ARRAY_HEADER :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """#SBATCH --array={run_start}-{run_stop} """ .. raw:: html
.. py:data:: _BASE :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """echo 'XYZPY script starting...' cd {working_directory} export OMP_NUM_THREADS={num_threads} export MKL_NUM_THREADS={num_threads} export OPENBLAS_NUM_THREADS={num_threads} export NUMBA_NUM_THREADS={num_threads} {shell_setup} read -r -d '' SCRIPT << EOM {setup} from xyzpy.gen.cropping import grow, Crop if __name__ == '__main__': crop = Crop(name='{name}', parent_dir='{parent_dir}') print('Growing:', repr(crop)) grow_kwargs = dict( num_workers={num_workers}, subprocess={subprocess}, debugging={debugging}, verbosity_grow=2, ) """ .. raw:: html
.. py:data:: _CLUSTER_SGE_GROW_ALL_SCRIPT :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ crop.grow($SGE_TASK_ID, **grow_kwargs) """ .. raw:: html
.. py:data:: _CLUSTER_PBS_GROW_ALL_SCRIPT :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ crop.grow($PBS_ARRAY_INDEX, **grow_kwargs) """ .. raw:: html
.. py:data:: _CLUSTER_SLURM_GROW_ALL_SCRIPT :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ crop.grow($SLURM_ARRAY_TASK_ID, **grow_kwargs) """ .. raw:: html
.. py:data:: _CLUSTER_SGE_GROW_PARTIAL_SCRIPT :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ batch_ids = {batch_ids}] crop.grow(batch_ids[$SGE_TASK_ID - 1], **grow_kwargs) """ .. raw:: html
.. py:data:: _CLUSTER_PBS_GROW_PARTIAL_SCRIPT :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ batch_ids = {batch_ids} crop.grow(batch_ids[$PBS_ARRAY_INDEX - 1], **grow_kwargs) """ .. raw:: html
.. py:data:: _CLUSTER_SLURM_GROW_PARTIAL_SCRIPT :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ batch_ids = {batch_ids} crop.grow(batch_ids[$SLURM_ARRAY_TASK_ID - 1], **grow_kwargs) """ .. raw:: html
.. py:data:: _BASE_CLUSTER_GROW_SINGLE :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """ grow_kwargs['verbosity_grow'] = 0 batch_ids = {batch_ids} crop.grow(batch_ids, **grow_kwargs) """ .. raw:: html
.. py:data:: _BASE_CLUSTER_SCRIPT_END :value: Multiline-String .. raw:: html
Show Value .. code-block:: python """EOM {launcher} -c "$SCRIPT" echo 'XYZPY script finished' """ .. raw:: html
.. py:function:: gen_cluster_script(crop, scheduler, batch_ids=None, *, mode='array', num_procs=None, num_threads=None, num_nodes=None, num_workers=None, subprocess=False, mem=None, mem_per_cpu=None, gigabytes=None, time=None, hours=None, minutes=None, seconds=None, conda_env=True, launcher=None, setup='#', shell_setup='', mpi=False, temp_gigabytes=1, output_directory=None, debugging=False, **kwargs) Generate a cluster script to grow a Crop. :param crop: The crop to grow. :type crop: Crop :param scheduler: Whether to use a SGE, PBS or slurm submission script template. :type scheduler: {'sge', 'pbs', 'slurm'} :param batch_ids: Which batch numbers to grow, defaults to all missing batches. :type batch_ids: int or tuple[int] :param mode: How to distribute the batches, either as an array job with a single batch per job, or as a single job processing batches in parallel. :type mode: {'array', 'single'} :param hours: How many hours to request, default=0. :type hours: int :param minutes: How many minutes to request, default=20. :type minutes: int, optional :param seconds: How many seconds to request, default=0. :type seconds: int, optional :param gigabytes: How much memory to request, default: 2. :type gigabytes: int, optional :param num_procs: How many processes to request (threaded cores or MPI), default: 1. :type num_procs: int, optional :param num_threads: How many threads to use per process. Will be computed automatically based on ``num_procs`` and ``num_workers`` if not specified. :type num_threads: int, optional :param num_workers: How many workers to use for parallel growing, default is sequential. If specified, then generally ``num_workers * num_threads == num_procs``. :type num_workers: int, optional :param subprocess: Whether to use a fresh subprocess for each batch, default: False. :type subprocess: bool, optional :param num_nodes: How many nodes to request, default: 1. :type num_nodes: int, optional :param conda_env: Whether to activate a conda environment before running the script. If ``True``, the environment will be the same as the one used to launch the script. If a string, the environment will be the one specified by the string. :type conda_env: bool or str, optional :param launcher: How to launch the script, default: the current Python interpreter. But could for example be ``'mpiexec python'`` for an MPI program. :type launcher: str, optional :param setup: Python script to run before growing, for things that shouldn't be put in the crop function itself, e.g. one-time imports with side-effects like: ``"import tensorflow as tf; tf.enable_eager_execution()``". :type setup: str, optional :param shell_setup: Commands to be run by the shell before the python script is executed. :type shell_setup: str, optional :param mpi: Request MPI processes not threaded processes :type mpi: bool, optional :param temp_gigabytes: How much temporary on-disk memory. :type temp_gigabytes: int, optional :param output_directory: What directory to write output to. Defaults to "$HOME/Scratch/output". :type output_directory: str, optional :param debugging: Set the python log level to debugging. :type debugging: bool, optional :param kwargs: Extra keyword arguments are taken to be extra resources to request in the header of the submission script, e.g. ``{'gpu': 1}`` will add ``"#SBATCH --gpu=1"`` to the header if using slurm. If you supply literal ``True`` or ``None`` as the value, then the key will be treated as a flag. E.g. ``{'requeue': None}`` will add ``"#SBATCH --requeue"`` to the header. :type kwargs: dict, optional :rtype: str .. py:function:: grow_cluster(crop, scheduler, batch_ids=None, *, hours=None, minutes=None, seconds=None, gigabytes=2, num_nodes=1, num_procs=1, num_threads=None, num_workers=None, subprocess=False, conda_env=True, launcher=None, setup='#', shell_setup='', mpi=False, temp_gigabytes=1, output_directory=None, debugging=False, **kwargs) Automagically submit SGE, PBS, or slurm jobs to grow all missing results. :param crop: The crop to grow. :type crop: Crop :param scheduler: Whether to use a SGE, PBS or slurm submission script template. :type scheduler: {'sge', 'pbs', 'slurm'} :param batch_ids: Which batch numbers to grow, defaults to all missing batches. :type batch_ids: int or tuple[int] :param hours: How many hours to request, default=0. :type hours: int :param minutes: How many minutes to request, default=20. :type minutes: int, optional :param seconds: How many seconds to request, default=0. :type seconds: int, optional :param gigabytes: How much memory to request, default: 2. :type gigabytes: int, optional :param num_nodes: How many nodes to request, default: 1. :type num_nodes: int, optional :param num_procs: How many processes to request (threaded cores or MPI), default: 1. :type num_procs: int, optional :param num_threads: How many threads to use per process. Will be computed automatically based on ``num_procs`` and ``num_workers`` if not specified. :type num_threads: int, optional :param num_workers: How many workers to use for parallel growing, default is sequential. If specified, then generally ``num_workers * num_threads == num_procs``. :type num_workers: int, optional :param subprocess: Whether to use a fresh subprocess for each batch, default: False. :type subprocess: bool, optional :param conda_env: Whether to activate a conda environment before running the script. If ``True``, the environment will be the same as the one used to launch the script. If a string, the environment will be the one specified by the string. :type conda_env: bool or str, optional :param launcher: How to launch the script, default: the current Python interpreter. But could for example be ``'mpiexec python'`` for a MPI program. :type launcher: str, optional :param setup: Python script to run before growing, for things that shouldnt't be put in the crop function itself, e.g. one-time imports with side-effects like: ``"import tensorflow as tf; tf.enable_eager_execution()``". :type setup: str, optional :param shell_setup: Commands to be run by the shell before the python script is executed. E.g. ``conda activate my_env``. :type shell_setup: str, optional :param mpi: Request MPI processes not threaded processes. :type mpi: bool, optional :param temp_gigabytes: How much temporary on-disk memory. :type temp_gigabytes: int, optional :param output_directory: What directory to write output to. Defaults to "$HOME/Scratch/output". :type output_directory: str, optional :param debugging: Set the python log level to debugging. :type debugging: bool, optional .. py:function:: gen_qsub_script(crop, batch_ids=None, *, scheduler='sge', **kwargs) Generate a qsub script to grow a Crop. Deprecated in favor of `gen_cluster_script` and will be removed in the future. :param crop: The crop to grow. :type crop: Crop :param batch_ids: Which batch numbers to grow, defaults to all missing batches. :type batch_ids: int or tuple[int] :param scheduler: Whether to use an SGE or PBS submission script template. :type scheduler: {'sge', 'pbs'}, optional :param kwargs: See `gen_cluster_script` for all other parameters. .. py:function:: qsub_grow(crop, batch_ids=None, *, scheduler='sge', **kwargs) Automagically submit SGE or PBS jobs to grow all missing results. Deprecated in favor of `grow_cluster` and will be removed in the future. :param crop: The crop to grow. :type crop: Crop :param batch_ids: Which batch numbers to grow, defaults to all missing batches. :type batch_ids: int or tuple[int] :param scheduler: Whether to use a SGE or PBS submission script template. :type scheduler: {'sge', 'pbs'}, optional :param kwargs: See `grow_cluster` for all other parameters. .. py:function:: clean_slurm_outputs(job, directory='.', cancel_if_finished=True) .. py:function:: manage_slurm_outputs(crop, job, wait_time=60)