Debugging

Introduction

If you’re running your Kedro pipeline from the CLI or you can’t/don’t want to run Kedro from within your IDE debugging framework, it can be hard to debug your Kedro pipeline or nodes. This is particularly frustrating because:

  • If you have long running nodes or pipelines, inserting print statements and running them multiple times quickly becomes a time-consuming procedure.

  • Debugging nodes outside the run session isn’t very helpful because getting access to the local scope within the node can be hard, especially if you’re dealing with large data or memory datasets, where you need to chain a few nodes together or re-run your pipeline to produce the data for debugging purposes.

This guide provides examples on how to instantiate a post-mortem debugging session with pdb using Hooks when an uncaught error occurs during a pipeline run. Note that ipdb could be integrated in the same manner.

If you are looking for guides on how to setup debugging with IDEs, please visit the guide for VSCode and PyCharm.

Debugging Node

To start a debugging session when an uncaught error is raised within your node, implement the on_node_error Hook specification:

import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl


class PDBNodeDebugHook:
    """A hook class for creating a post mortem debugging with the PDB debugger
    whenever an error is triggered within a node. The local scope from when the
    exception occured is available within this debugging session.
    """

    @hook_impl
    def on_node_error(self):
        _, _, traceback_object = sys.exc_info()

        #  Print the traceback information for debugging ease
        traceback.print_tb(traceback_object)

        # Drop you into a post mortem debugging session
        pdb.post_mortem(traceback_object)

You can then register this PDBNodeDebugHook in your project’s settings.py:

HOOKS = (PDBNodeDebugHook(),)

Debugging Pipeline

To start a debugging session when an uncaught error is raised within your pipeline, implement the on_pipeline_error Hook specification:

import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl


class PDBPipelineDebugHook:
    """A hook class for creating a post mortem debugging with the PDB debugger
    whenever an error is triggered within a pipeline. The local scope from when the
    exception occured is available within this debugging session.
    """

    @hook_impl
    def on_pipeline_error(self):
        # We don't need the actual exception since it is within this stack frame
        _, _, traceback_object = sys.exc_info()

        #  Print the traceback information for debugging ease
        traceback.print_tb(traceback_object)

        # Drop you into a post mortem debugging session
        pdb.post_mortem(traceback_object)

You can then register this PDBPipelineDebugHook in your project’s settings.py:

HOOKS = (PDBPipelineDebugHook(),)