ParallelRunner
kedro.runner.ParallelRunner ¶
ParallelRunner(max_workers=None, is_async=False)
Bases: AbstractRunner
ParallelRunner is an AbstractRunner implementation. It can
be used to run the Pipeline in parallel groups formed by toposort.
Please note that this runner implementation validates dataset using the
_validate_catalog method, which checks if any of the datasets are
single process only using the _SINGLE_PROCESS dataset attribute.
Warning
This runner will not execute node and dataset hooks. Use
SequentialRunner or ThreadRunner if your project relies on these
hooks.
Parameters:
-
max_workers(int | None, default:None) –Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers).
-
is_async(bool, default:False) –If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
Raises: ValueError: bad parameters passed
Source code in kedro/runner/parallel_runner.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
__del__ ¶
__del__()
Source code in kedro/runner/parallel_runner.py
76 77 | |
_get_executor ¶
_get_executor(max_workers)
Source code in kedro/runner/parallel_runner.py
124 125 126 127 128 129 | |
_get_required_workers_count ¶
_get_required_workers_count(pipeline)
Calculate the max number of processes required for the pipeline, limit to the number of CPU cores.
Source code in kedro/runner/parallel_runner.py
110 111 112 113 114 115 116 117 118 119 120 121 122 | |
_run ¶
_run(pipeline, catalog, hook_manager=None, run_id=None)
The method implementing parallel pipeline running.
Parameters:
-
pipeline(Pipeline) –The
Pipelineto run. -
catalog(SharedMemoryCatalogProtocol) –An implemented instance of
SharedMemoryCatalogProtocolfrom which to fetch data. -
hook_manager(PluginManager | None, default:None) –The
PluginManagerto activate hooks. -
run_id(str | None, default:None) –The id of the run.
Raises:
-
AttributeError–When the provided pipeline is not suitable for parallel execution.
-
RuntimeError–If the runner is unable to schedule the execution of all pipeline nodes.
-
Exception–In case of any downstream node failure.
Source code in kedro/runner/parallel_runner.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |
_set_manager_datasets ¶
_set_manager_datasets(catalog)
Source code in kedro/runner/parallel_runner.py
107 108 | |
_validate_catalog
classmethod
¶
_validate_catalog(catalog)
Ensure that all datasets are serialisable and that we do not have any non proxied memory datasets being used as outputs as their content will not be synchronized across threads.
Source code in kedro/runner/parallel_runner.py
99 100 101 102 103 104 105 | |
_validate_nodes
classmethod
¶
_validate_nodes(nodes)
Ensure all tasks are serialisable.
Source code in kedro/runner/parallel_runner.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |