Skip to content

PickleDataset

PickleDataset loads and saves data using Python's pickle module, with support for Redis as the backend.

kedro_datasets.redis.PickleDataset

PickleDataset(
    *,
    key,
    backend="pickle",
    load_args=None,
    save_args=None,
    credentials=None,
    redis_args=None,
    metadata=None
)

Bases: AbstractDataset[Any, Any]

PickleDataset loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis app from_url and setting a value.

Examples:

Using the YAML API:

my_python_object: # simple example
  type: redis.PickleDataset
  key: my_object
  from_url_args:
    url: redis://127.0.0.1:6379

final_python_object: # example with save args
  type: redis.PickleDataset
  key: my_final_object
  from_url_args:
    url: redis://127.0.0.1:6379
    db: 1
  save_args:
    ex: 10

Using the Python API:

>>> import pandas as pd
>>> from kedro_datasets.redis import PickleDataset
>>>
>>> data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
>>>
>>> my_data = PickleDataset(key="my_data")
>>> my_data.save(data)
>>> reloaded = my_data.load()
>>> assert data.equals(reloaded)

serialise/deserialise objects.

Example backends that are compatible - non-exhaustive
  • pickle
  • dill
  • compress_pickle
  • cloudpickle
Example backends that are incompatible
  • torch

Parameters:

  • key (str) –

    The key to use for saving/loading object to Redis.

  • backend (str, default: 'pickle' ) –

    Backend to use, must be an import path to a module which satisfies the pickle interface. That is, contains a loads and dumps function. Defaults to 'pickle'.

  • load_args (dict[str, Any] | None, default: None ) –

    Pickle options for loading pickle files. You can pass in arguments that the backend load function specified accepts, e.g: pickle.loads: https://docs.python.org/3/library/pickle.html#pickle.loads dill.loads: https://dill.readthedocs.io/en/latest/index.html#dill.loads compress_pickle.loads: https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.loads cloudpickle.loads: https://github.com/cloudpipe/cloudpickle/blob/master/tests/cloudpickle_test.py All defaults are preserved.

  • save_args (dict[str, Any] | None, default: None ) –

    Pickle options for saving pickle files. You can pass in arguments that the backend dump function specified accepts, e.g: pickle.dumps: https://docs.python.org/3/library/pickle.html#pickle.dump dill.dumps: https://dill.readthedocs.io/en/latest/index.html#dill.dumps compress_pickle.dumps: https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.dumps cloudpickle.dumps: https://github.com/cloudpipe/cloudpickle/blob/master/tests/cloudpickle_test.py All defaults are preserved.

  • credentials (dict[str, Any] | None, default: None ) –

    Credentials required to get access to the redis server. E.g. {"password": None}.

  • redis_args (dict[str, Any] | None, default: None ) –

    Extra arguments to pass into the redis client constructor redis.StrictRedis.from_url. (e.g. {"socket_timeout": 10}), as well as to pass to the redis.StrictRedis.set through nested keys from_url_args and set_args. Here you can find all available arguments for from_url: https://redis-py.readthedocs.io/en/stable/connections.html?highlight=from_url#redis.Redis.from_url All defaults are preserved, except url, which is set to redis://127.0.0.1:6379. You could also specify the url through the env variable REDIS_URL.

  • metadata (dict[str, Any] | None, default: None ) –

    Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.

Raises:

  • ValueError

    If backend does not satisfy the pickle interface.

  • ImportError

    If the backend module could not be imported.

Source code in kedro_datasets/redis/redis_dataset.py
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
def __init__(  # noqa: PLR0913
    self,
    *,
    key: str,
    backend: str = "pickle",
    load_args: dict[str, Any] | None = None,
    save_args: dict[str, Any] | None = None,
    credentials: dict[str, Any] | None = None,
    redis_args: dict[str, Any] | None = None,
    metadata: dict[str, Any] | None = None,
) -> None:
    """Creates a new instance of ``PickleDataset``. This loads/saves data from/to
    a Redis database while deserialising/serialising. Supports custom backends to
    serialise/deserialise objects.

    Example backends that are compatible - non-exhaustive:
        * `pickle`
        * `dill`
        * `compress_pickle`
        * `cloudpickle`

    Example backends that are incompatible:
        * `torch`

    Args:
        key: The key to use for saving/loading object to Redis.
        backend: Backend to use, must be an import path to a module which satisfies the
            ``pickle`` interface. That is, contains a `loads` and `dumps` function.
            Defaults to 'pickle'.
        load_args: Pickle options for loading pickle files.
            You can pass in arguments that the backend load function specified accepts, e.g:
            pickle.loads: https://docs.python.org/3/library/pickle.html#pickle.loads
            dill.loads: https://dill.readthedocs.io/en/latest/index.html#dill.loads
            compress_pickle.loads:
            https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.loads
            cloudpickle.loads:
            https://github.com/cloudpipe/cloudpickle/blob/master/tests/cloudpickle_test.py
            All defaults are preserved.
        save_args: Pickle options for saving pickle files.
            You can pass in arguments that the backend dump function specified accepts, e.g:
            pickle.dumps: https://docs.python.org/3/library/pickle.html#pickle.dump
            dill.dumps: https://dill.readthedocs.io/en/latest/index.html#dill.dumps
            compress_pickle.dumps:
            https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.dumps
            cloudpickle.dumps:
            https://github.com/cloudpipe/cloudpickle/blob/master/tests/cloudpickle_test.py
            All defaults are preserved.
        credentials: Credentials required to get access to the redis server.
            E.g. `{"password": None}`.
        redis_args: Extra arguments to pass into the redis client constructor
            ``redis.StrictRedis.from_url``. (e.g. `{"socket_timeout": 10}`), as well as to pass
            to the ``redis.StrictRedis.set`` through nested keys `from_url_args` and `set_args`.
            Here you can find all available arguments for `from_url`:
            https://redis-py.readthedocs.io/en/stable/connections.html?highlight=from_url#redis.Redis.from_url
            All defaults are preserved, except `url`, which is set to `redis://127.0.0.1:6379`.
            You could also specify the url through the env variable ``REDIS_URL``.
        metadata: Any arbitrary metadata.
            This is ignored by Kedro, but may be consumed by users or external plugins.

    Raises:
        ValueError: If ``backend`` does not satisfy the `pickle` interface.
        ImportError: If the ``backend`` module could not be imported.
    """
    try:
        imported_backend = importlib.import_module(backend)
    except ImportError as exc:
        raise ImportError(
            f"Selected backend '{backend}' could not be imported. "
            "Make sure it is installed and importable."
        ) from exc

    if not (
        hasattr(imported_backend, "loads") and hasattr(imported_backend, "dumps")
    ):
        raise ValueError(
            f"Selected backend '{backend}' should satisfy the pickle interface. "
            "Missing one of 'loads' and 'dumps' on the backend."
        )

    self._backend = backend

    self._key = key

    self.metadata = metadata

    _redis_args = deepcopy(redis_args) or {}
    self._redis_from_url_args = _redis_args.pop("from_url_args", {})
    self._redis_from_url_args.setdefault("url", self.DEFAULT_REDIS_URL)
    self._redis_set_args = _redis_args.pop("set_args", {})
    _credentials = deepcopy(credentials) or {}

    self._load_args = deepcopy(self.DEFAULT_LOAD_ARGS)
    if load_args is not None:
        self._load_args.update(load_args)
    self._save_args = deepcopy(self.DEFAULT_SAVE_ARGS)
    if save_args is not None:
        self._save_args.update(save_args)

    self._redis_db = redis.Redis.from_url(
        **self._redis_from_url_args, **_credentials
    )

DEFAULT_LOAD_ARGS class-attribute instance-attribute

DEFAULT_LOAD_ARGS = {}

DEFAULT_REDIS_URL class-attribute instance-attribute

DEFAULT_REDIS_URL = getenv(
    "REDIS_URL", "redis://127.0.0.1:6379"
)

DEFAULT_SAVE_ARGS class-attribute instance-attribute

DEFAULT_SAVE_ARGS = {}

_backend instance-attribute

_backend = backend

_key instance-attribute

_key = key

_load_args instance-attribute

_load_args = deepcopy(DEFAULT_LOAD_ARGS)

_redis_db instance-attribute

_redis_db = from_url(
    **(_redis_from_url_args), **_credentials
)

_redis_from_url_args instance-attribute

_redis_from_url_args = pop('from_url_args', {})

_redis_set_args instance-attribute

_redis_set_args = pop('set_args', {})

_save_args instance-attribute

_save_args = deepcopy(DEFAULT_SAVE_ARGS)

metadata instance-attribute

metadata = metadata

_describe

_describe()
Source code in kedro_datasets/redis/redis_dataset.py
161
162
def _describe(self) -> dict[str, Any]:
    return {"key": self._key, **self._redis_from_url_args}

_exists

_exists()
Source code in kedro_datasets/redis/redis_dataset.py
187
188
189
190
191
192
193
def _exists(self) -> bool:
    try:
        return bool(self._redis_db.exists(self._key))
    except Exception as exc:
        raise DatasetError(
            f"The existence of key {self._key} could not be established due to: {exc}"
        ) from exc

load

load()
Source code in kedro_datasets/redis/redis_dataset.py
166
167
168
169
170
171
172
def load(self) -> Any:
    if not self.exists():
        raise DatasetError(f"The provided key {self._key} does not exists.")
    imported_backend = importlib.import_module(self._backend)
    return imported_backend.loads(  # type: ignore
        self._redis_db.get(self._key), **self._load_args
    )  # type: ignore

save

save(data)
Source code in kedro_datasets/redis/redis_dataset.py
174
175
176
177
178
179
180
181
182
183
184
185
def save(self, data: Any) -> None:
    try:
        imported_backend = importlib.import_module(self._backend)
        self._redis_db.set(
            self._key,
            imported_backend.dumps(data, **self._save_args),  # type: ignore
            **self._redis_set_args,
        )
    except Exception as exc:
        raise DatasetError(
            f"{data.__class__} was not serialised due to: {exc}"
        ) from exc