kedro.io.AbstractVersionedDataset¶

class kedro.io.AbstractVersionedDataset(filepath, version, exists_function=None, glob_function=None)[source]¶

AbstractVersionedDataset is the base class for all versioned data set implementations. All data sets that implement versioning should extend this abstract class and implement the methods marked as abstract.

Example:

from pathlib import Path, PurePosixPath
import pandas as pd
from kedro.io import AbstractVersionedDataset


class MyOwnDataset(AbstractVersionedDataset):
    def __init__(self, filepath, version, param1, param2=True):
        super().__init__(PurePosixPath(filepath), version)
        self._param1 = param1
        self._param2 = param2

    def _load(self) -> pd.DataFrame:
        load_path = self._get_load_path()
        return pd.read_csv(load_path)

    def _save(self, df: pd.DataFrame) -> None:
        save_path = self._get_save_path()
        df.to_csv(str(save_path))

    def _exists(self) -> bool:
        path = self._get_load_path()
        return Path(path.as_posix()).exists()

    def _describe(self):
        return dict(version=self._version, param1=self._param1, param2=self._param2)

Example catalog.yml specification:

my_dataset:
    type: <path-to-my-own-dataset>.MyOwnDataset
    filepath: data/01_raw/my_data.csv
    versioned: true
    param1: <param1-value> # param1 is a required argument
    # param2 will be True by default

Methods

`exists`()	Checks whether a data set's output already exists by calling the provided _exists() method.
`from_config`(name, config[, load_version, ...])	Create a data set instance using the configuration provided.
`load`()	Loads data by delegation to the provided load method.
`release`()	Release any cached data.
`resolve_load_version`()	Compute the version the dataset should be loaded with.
`resolve_save_version`()	Compute the version the dataset should be saved with.
`save`(data)	Saves data by delegation to the provided save method.

__init__(filepath, version, exists_function=None, glob_function=None)[source]¶

Creates a new instance of AbstractVersionedDataset.

Parameters:

filepath (PurePosixPath) – Filepath in POSIX format to a file.
version (Version | None) – If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated.
exists_function (Callable[[str], bool] | None) – Function that is used for determining whether a path exists in a filesystem.
glob_function (Callable[[str], list[str]] | None) – Function that is used for finding all paths in a filesystem, which match a given pattern.

exists()[source]¶

Checks whether a data set’s output already exists by calling the provided _exists() method.

Return type:: bool
Returns:: Flag indicating whether the output already exists.
Raises:: DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)¶

Create a data set instance using the configuration provided.

Parameters:

name (str) – Data set name.
config (dict[str, Any]) – Data set config dictionary.
load_version (str | None) – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
save_version (str | None) – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

Return type:

AbstractDataset

Returns:

An instance of an AbstractDataset subclass.

Raises:

DatasetError – When the function fails to create the data set from its config.

load()[source]¶

Loads data by delegation to the provided load method.

Return type:: TypeVar(_DO)
Returns:: Data returned by the provided load method.
Raises:: DatasetError – When underlying load method raises error.

release()¶

Release any cached data.

Raises:: DatasetError – when underlying release method raises error.
Return type:: None

resolve_load_version()[source]¶

Compute the version the dataset should be loaded with.

Return type:: str | None

resolve_save_version()[source]¶

Compute the version the dataset should be saved with.

Return type:: str | None

save(data)[source]¶

Saves data by delegation to the provided save method.

Parameters:

data (TypeVar(_DI)) – the value to be saved by provided save method.

Raises:

DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.

Return type:

None