kedro_datasets.redis.PickleDataset¶
- class kedro_datasets.redis.PickleDataset(*, key, backend='pickle', load_args=None, save_args=None, credentials=None, redis_args=None, metadata=None)[source]¶
PickleDataset
loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis appfrom_url
and setting a value.Example usage for the YAML API:
my_python_object: # simple example type: redis.PickleDataset key: my_object from_url_args: url: redis://127.0.0.1:6379 final_python_object: # example with save args type: redis.PickleDataset key: my_final_object from_url_args: url: redis://127.0.0.1:6379 db: 1 save_args: ex: 10
Example usage for the Python API:
from kedro_datasets.redis import PickleDataset import pandas as pd data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]}) my_data = PickleDataset(key="my_data") my_data.save(data) reloaded = my_data.load() assert data.equals(reloaded)
Attributes
Methods
exists
()Checks whether a dataset's output already exists by calling the provided _exists() method.
from_config
(name, config[, load_version, ...])Create a dataset instance using the configuration provided.
load
()Loads data by delegation to the provided load method.
release
()Release any cached data.
save
(data)Saves data by delegation to the provided save method.
Converts the dataset instance into a dictionary-based configuration for serialization.
- DEFAULT_REDIS_URL = 'redis://127.0.0.1:6379'¶
- __init__(*, key, backend='pickle', load_args=None, save_args=None, credentials=None, redis_args=None, metadata=None)[source]¶
Creates a new instance of
PickleDataset
. This loads/saves data from/to a Redis database while deserialising/serialising. Supports custom backends to serialise/deserialise objects.- Example backends that are compatible (non-exhaustive):
pickle
dill
compress_pickle
cloudpickle
- Example backends that are incompatible:
torch
- Parameters:
key (
str
) – The key to use for saving/loading object to Redis.backend (
str
) – Backend to use, must be an import path to a module which satisfies thepickle
interface. That is, contains a loads and dumps function. Defaults to ‘pickle’.load_args (
Optional
[dict
[str
,Any
]]) – Pickle options for loading pickle files. You can pass in arguments that the backend load function specified accepts, e.g: pickle.loads: https://docs.python.org/3/library/pickle.html#pickle.loads dill.loads: https://dill.readthedocs.io/en/latest/index.html#dill.loads compress_pickle.loads: https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.loads cloudpickle.loads: https://github.com/cloudpipe/cloudpickle/blob/master/tests/cloudpickle_test.py All defaults are preserved.save_args (
Optional
[dict
[str
,Any
]]) – Pickle options for saving pickle files. You can pass in arguments that the backend dump function specified accepts, e.g: pickle.dumps: https://docs.python.org/3/library/pickle.html#pickle.dump dill.dumps: https://dill.readthedocs.io/en/latest/index.html#dill.dumps compress_pickle.dumps: https://lucianopaz.github.io/compress_pickle/html/api/compress_pickle.html#compress_pickle.compress_pickle.dumps cloudpickle.dumps: https://github.com/cloudpipe/cloudpickle/blob/master/tests/cloudpickle_test.py All defaults are preserved.credentials (
Optional
[dict
[str
,Any
]]) – Credentials required to get access to the redis server. E.g. {“password”: None}.redis_args (
Optional
[dict
[str
,Any
]]) – Extra arguments to pass into the redis client constructorredis.StrictRedis.from_url
. (e.g. {“socket_timeout”: 10}), as well as to pass to theredis.StrictRedis.set
through nested keys from_url_args and set_args. Here you can find all available arguments for from_url: https://redis-py.readthedocs.io/en/stable/connections.html?highlight=from_url#redis.Redis.from_url All defaults are preserved, except url, which is set to redis://127.0.0.1:6379. You could also specify the url through the env variableREDIS_URL
.metadata (
Optional
[dict
[str
,Any
]]) – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
- Raises:
ValueError – If
backend
does not satisfy the pickle interface.ImportError – If the
backend
module could not be imported.
- exists()[source]¶
Checks whether a dataset’s output already exists by calling the provided _exists() method.
- Return type:
- Returns:
Flag indicating whether the output already exists.
- Raises:
DatasetError – when underlying exists method raises error.
- classmethod from_config(name, config, load_version=None, save_version=None)[source]¶
Create a dataset instance using the configuration provided.
- Parameters:
name (
str
) – Data set name.load_version (
Optional
[str
]) – Version string to be used forload
operation if the dataset is versioned. Has no effect on the dataset if versioning was not enabled.save_version (
Optional
[str
]) – Version string to be used forsave
operation if the dataset is versioned. Has no effect on the dataset if versioning was not enabled.
- Return type:
- Returns:
An instance of an
AbstractDataset
subclass.- Raises:
DatasetError – When the function fails to create the dataset from its config.
- load()[source]¶
Loads data by delegation to the provided load method.
- Return type:
- Returns:
Data returned by the provided load method.
- Raises:
DatasetError – When underlying load method raises error.
- release()[source]¶
Release any cached data.
- Raises:
DatasetError – when underlying release method raises error.
- Return type:
- save(data)[source]¶
Saves data by delegation to the provided save method.
- Parameters:
data (
Any
) – the value to be saved by provided save method.- Raises:
DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.
- Return type:
- to_config()[source]¶
Converts the dataset instance into a dictionary-based configuration for serialization. Ensures that any subclass-specific details are handled, with additional logic for versioning and caching implemented for CachedDataset.
Adds a key for the dataset’s type using its module and class name and includes the initialization arguments.
For CachedDataset it extracts the underlying dataset’s configuration, handles the versioned flag and removes unnecessary metadata. It also ensures the embedded dataset’s configuration is appropriately flattened or transformed.
If the dataset has a version key, it sets the versioned flag in the configuration.
Removes the metadata key from the configuration if present.