kedro.runner.ThreadRunner¶
- class kedro.runner.ThreadRunner(max_workers=None, is_async=False)[source]¶
ThreadRunneris anAbstractRunnerimplementation. It can be used to run thePipelinein parallel groups formed by toposort using threads.Methods
create_default_data_set(ds_name)Factory method for creating the default dataset for the runner.
run(pipeline, catalog[, hook_manager, ...])Run the
Pipelineusing the datasets provided bycatalogand save results back to the same objects.run_only_missing(pipeline, catalog, hook_manager)Run only the missing outputs from the
Pipelineusing the datasets provided bycatalog, and save results back to the same objects.- __init__(max_workers=None, is_async=False)[source]¶
Instantiates the runner.
- Parameters:
max_workers (
Optional[int]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count.is_async (
bool) – If True, set to False, because ThreadRunner doesn’t support loading and saving the node inputs and outputs asynchronously with threads. Defaults to False.
- Raises:
ValueError – bad parameters passed
- create_default_data_set(ds_name)[source]¶
Factory method for creating the default dataset for the runner.
- Parameters:
ds_name (
str) – Name of the missing dataset.- Return type:
- Returns:
An instance of
MemoryDatasetto be used for all unregistered datasets.
- run(pipeline, catalog, hook_manager=None, session_id=None)¶
Run the
Pipelineusing the datasets provided bycatalogand save results back to the same objects.- Parameters:
pipeline – The
Pipelineto run.catalog – The
DataCatalogfrom which to fetch data.hook_manager – The
PluginManagerto activate hooks.session_id – The id of the session.
- Raises:
ValueError – Raised when
Pipelineinputs cannot be satisfied.- Returns:
Any node outputs that cannot be processed by the
DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.
- run_only_missing(pipeline, catalog, hook_manager)¶
Run only the missing outputs from the
Pipelineusing the datasets provided bycatalog, and save results back to the same objects.- Parameters:
pipeline – The
Pipelineto run.catalog – The
DataCatalogfrom which to fetch data.hook_manager – The
PluginManagerto activate hooks.
- Raises:
ValueError – Raised when
Pipelineinputs cannot be satisfied.- Returns:
Any node outputs that cannot be processed by the
DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.