kedro_datasets.pandas.SQLTableDataset

class kedro_datasets.pandas.SQLTableDataset(*, table_name, credentials, load_args=None, save_args=None, metadata=None)[source]

SQLTableDataset loads data from a SQL table and saves a pandas dataframe to a table. It uses pandas.DataFrame internally, so it supports all allowed pandas options on read_sql_table and to_sql methods. Since Pandas uses SQLAlchemy behind the scenes, when instantiating SQLTableDataset one needs to pass a compatible connection string either in credentials (see the example code snippet below) or in load_args and save_args. Connection string formats supported by SQLAlchemy can be found here: https://docs.sqlalchemy.org/core/engines.html#database-urls

SQLTableDataset modifies the save parameters and stores the data with no index. This is designed to make load and save methods symmetric.

Example usage for the YAML API:

shuttles_table_dataset:
  type: pandas.SQLTableDataset
  credentials: db_credentials
  table_name: shuttles
  load_args:
    schema: dwschema
  save_args:
    schema: dwschema
    if_exists: replace

Sample database credentials entry in credentials.yml:

db_credentials:
  con: postgresql://scott:tiger@localhost/test
  pool_size: 10 # additional parameters

Example usage for the Python API:

from kedro_datasets.pandas import SQLTableDataset
import pandas as pd

data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
table_name = "table_a"
credentials = {"con": f"sqlite:///{tmp_path / 'test.db'}"}
dataset = SQLTableDataset(table_name=table_name, credentials=credentials)

dataset.save(data)
reloaded = dataset.load()

assert data.equals(reloaded)

Attributes

DEFAULT_LOAD_ARGS

DEFAULT_SAVE_ARGS

engine

The Engine object for the dataset's connection string.

engines

Methods

create_connection(connection_str[, ...])

Given a connection string, create singleton connection to be used across all instances of SQLTableDataset that need to connect to the same source.

exists()

Checks whether a dataset's output already exists by calling the provided _exists() method.

from_config(name, config[, load_version, ...])

Create a dataset instance using the configuration provided.

load()

Loads data by delegation to the provided load method.

preview([nrows])

Generate a preview of the dataset with a specified number of rows.

release()

Release any cached data.

save(data)

Saves data by delegation to the provided save method.

to_config()

Converts the dataset instance into a dictionary-based configuration for serialization.

DEFAULT_LOAD_ARGS: dict[str, Any] = {}
DEFAULT_SAVE_ARGS: dict[str, Any] = {'index': False}
__init__(*, table_name, credentials, load_args=None, save_args=None, metadata=None)[source]

Creates a new SQLTableDataset.

Parameters:
Raises:

DatasetError – When either table_name or con is empty.

classmethod create_connection(connection_str, connection_args=None)[source]

Given a connection string, create singleton connection to be used across all instances of SQLTableDataset that need to connect to the same source.

Return type:

None

property engine

The Engine object for the dataset’s connection string.

engines: dict[str, Any] = {}
exists()[source]

Checks whether a dataset’s output already exists by calling the provided _exists() method.

Return type:

bool

Returns:

Flag indicating whether the output already exists.

Raises:

DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)[source]

Create a dataset instance using the configuration provided.

Parameters:
  • name (str) – Data set name.

  • config (dict[str, Any]) – Data set config dictionary.

  • load_version (Optional[str]) – Version string to be used for load operation if the dataset is versioned. Has no effect on the dataset if versioning was not enabled.

  • save_version (Optional[str]) – Version string to be used for save operation if the dataset is versioned. Has no effect on the dataset if versioning was not enabled.

Return type:

AbstractDataset

Returns:

An instance of an AbstractDataset subclass.

Raises:

DatasetError – When the function fails to create the dataset from its config.

load()[source]

Loads data by delegation to the provided load method.

Return type:

DataFrame

Returns:

Data returned by the provided load method.

Raises:

DatasetError – When underlying load method raises error.

preview(nrows=5)[source]

Generate a preview of the dataset with a specified number of rows.

Parameters:

nrows (int) – The number of rows to include in the preview. Defaults to 5.

Returns:

A dictionary containing the data in a split format.

Return type:

dict

release()[source]

Release any cached data.

Raises:

DatasetError – when underlying release method raises error.

Return type:

None

save(data)[source]

Saves data by delegation to the provided save method.

Parameters:

data (DataFrame) – the value to be saved by provided save method.

Raises:
  • DatasetError – when underlying save method raises error.

  • FileNotFoundError – when save method got file instead of dir, on Windows.

  • NotADirectoryError – when save method got file instead of dir, on Unix.

Return type:

None

to_config()[source]

Converts the dataset instance into a dictionary-based configuration for serialization. Ensures that any subclass-specific details are handled, with additional logic for versioning and caching implemented for CachedDataset.

Adds a key for the dataset’s type using its module and class name and includes the initialization arguments.

For CachedDataset it extracts the underlying dataset’s configuration, handles the versioned flag and removes unnecessary metadata. It also ensures the embedded dataset’s configuration is appropriately flattened or transformed.

If the dataset has a version key, it sets the versioned flag in the configuration.

Removes the metadata key from the configuration if present.

Return type:

dict[str, Any]

Returns:

A dictionary containing the dataset’s type and initialization arguments.