kedro.io.AbstractDataset¶

class kedro.io.AbstractDataset[source]¶

AbstractDataset is the base class for all data set implementations. All data set implementations should extend this abstract class and implement the methods marked as abstract. If a specific dataset implementation cannot be used in conjunction with the ParallelRunner, such user-defined dataset should have the attribute _SINGLE_PROCESS = True. Example:

from pathlib import Path, PurePosixPath
import pandas as pd
from kedro.io import AbstractDataset


class MyOwnDataset(AbstractDataset[pd.DataFrame, pd.DataFrame]):
    def __init__(self, filepath, param1, param2=True):
        self._filepath = PurePosixPath(filepath)
        self._param1 = param1
        self._param2 = param2

    def _load(self) -> pd.DataFrame:
        return pd.read_csv(self._filepath)

    def _save(self, df: pd.DataFrame) -> None:
        df.to_csv(str(self._filepath))

    def _exists(self) -> bool:
        return Path(self._filepath.as_posix()).exists()

    def _describe(self):
        return dict(param1=self._param1, param2=self._param2)

Example catalog.yml specification:

my_dataset:
    type: <path-to-my-own-dataset>.MyOwnDataset
    filepath: data/01_raw/my_data.csv
    param1: <param1-value> # param1 is a required argument
    # param2 will be True by default

Methods

`exists`()	Checks whether a data set's output already exists by calling the provided _exists() method.
`from_config`(name, config[, load_version, ...])	Create a data set instance using the configuration provided.
`load`()	Loads data by delegation to the provided load method.
`release`()	Release any cached data.
`save`(data)	Saves data by delegation to the provided save method.

exists()[source]¶

Checks whether a data set’s output already exists by calling the provided _exists() method.

Return type:: bool
Returns:: Flag indicating whether the output already exists.
Raises:: DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)[source]¶

Create a data set instance using the configuration provided.

Parameters:

name (str) – Data set name.
config (dict[str, Any]) – Data set config dictionary.
load_version (str | None) – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
save_version (str | None) – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

Return type: