In this section, we introduce the concept of a node, for which the relevant API documentation is kedro.pipeline.node.

Nodes are the building blocks of pipelines, and represent tasks. Pipelines are used to combine nodes to build workflows, which range from simple machine learning workflows to end-to-end (E2E) production workflows.

You must first import libraries from Kedro and other standard tools to run the code snippets demonstrated below.

from kedro.pipeline import *
from import *
from kedro.runner import *

import pickle
import os

How to create a node

A node is created by specifying a function, input variable names and output variable names. Let’s consider a simple function that adds two numbers:

def add(x, y):
    return x + y

The function has two inputs (x and y) and a single output (the sum of the inputs).

Here is how a node is created with this function:

adder_node = node(func=add, inputs=["a", "b"], outputs="sum")

Here is the output:

Out[1]: Node(add, ['a', 'b'], 'sum', None)

You can also add labels to nodes, which will be used to describe them in logs:

adder_node = node(func=add, inputs=["a", "b"], outputs="sum")

adder_node = node(func=add, inputs=["a", "b"], outputs="sum", name="adding_a_and_b")

This gives the following output:

add([a,b]) -> [sum]
adding_a_and_b: add([a,b]) -> [sum]

Let’s break down the node definition:

  • add is the Python function that will execute when the node runs

  • ['a', 'b'] specify the input variable names

  • sum specifies the return variable name. The value returned by add will be bound in this variable

  • name is an optional label for the node, which can be used to provide description of the business logic it provides

Node definition syntax

A syntax describes function inputs and outputs. This syntax allows different Python functions to be reused in nodes, and supports dependency resolution in pipelines.

Syntax for input variables

Input syntax


Example function parameters

How function is called when node runs


No input

def f()



Single input

def f(arg1)


['a', 'b']

Multiple inputs

def f(arg1, arg2)

f(a, b)

dict(arg1='x', arg2='y')

Keyword inputs

def f(arg1, arg2)

f(arg1=x, arg2=y)

Syntax for output variables

Output syntax


Example return statement


No output

Does not return


Single output

return a

['a', 'b']

List output

return [a, b]

dict(key1='a', key2='b')

Dictionary output

return dict(key1=a, key2=b)

Any combinations of the above are possible, except nodes of the form node(f, None, None) (at least a single input or output must be provided).

**kwargs-only node functions

Sometimes, when creating reporting nodes for instance, you need to know the names of the datasets that your node receives, but you might not have this information in advance. This can be solved by defining a **kwargs-only function:

def reporting(**kwargs):
    result = []
    for name, data in kwargs.items():
        res = example_report(name, data)
    return combined_report(result)

Then, when it comes to constructing the Node, simply pass a dictionary to the node inputs:

from kedro.pipeline import node

uk_reporting_node = node(
    inputs={"uk_input1": "uk_input1", "uk_input2": "uk_input2", ...},

ge_reporting_node = node(
    inputs={"ge_input1": "ge_input1", "ge_input2": "ge_input2", ...},

Alternatively, you can also make use of a helper function that creates the mapping for you, so you can reuse it across your codebase.

 from kedro.pipeline import node

+mapping = lambda x: {k: k for k in x}
 uk_reporting_node = node(
-    inputs={"uk_input1": "uk_input1", "uk_input2": "uk_input2", ...},
+    inputs=mapping(["uk_input1", "uk_input2", ...]),

 ge_reporting_node = node(
-    inputs={"ge_input1": "ge_input1", "ge_input2": "ge_input2", ...},
+    inputs=mapping(["ge_input1", "ge_input2", ...]),

How to tag a node

Tags might be useful to run part of a pipeline without changing the code. For instance, kedro run --tag=ds will only run nodes that have a ds tag attached.

To tag a node, you can simply specify the tags argument:

node(func=add, inputs=["a", "b"], outputs="sum", name="adding_a_and_b", tags="node_tag")

Moreover, you can tag all nodes in a Pipeline. If the pipeline definition contains the tags= argument, Kedro will attach the corresponding tag to every node within that pipeline.

To run a pipeline using a tag:

kedro run --tag=pipeline_tag

This will run only the nodes found within the pipeline tagged with pipeline_tag.

How to run a node

To run a node, you must instantiate its inputs. In this case, the node expects two inputs:, b=3))

The output is as follows:

Out[2]: {'sum': 5}


You can also call a node as a regular Python function: adder_node(dict(a=2, b=3)). This will call, b=3)) behind the scenes.