kedro_datasets.pandas.SQLTableDataset

class kedro_datasets.pandas.SQLTableDataset(*, table_name, credentials, load_args=None, save_args=None, metadata=None)[source]

SQLTableDataset loads data from a SQL table and saves a pandas dataframe to a table. It uses pandas.DataFrame internally, so it supports all allowed pandas options on read_sql_table and to_sql methods. Since Pandas uses SQLAlchemy behind the scenes, when instantiating SQLTableDataset one needs to pass a compatible connection string either in credentials (see the example code snippet below) or in load_args and save_args. Connection string formats supported by SQLAlchemy can be found here: https://docs.sqlalchemy.org/core/engines.html#database-urls

SQLTableDataset modifies the save parameters and stores the data with no index. This is designed to make load and save methods symmetric.

Example usage for the YAML API:

shuttles_table_dataset:
  type: pandas.SQLTableDataset
  credentials: db_credentials
  table_name: shuttles
  load_args:
    schema: dwschema
  save_args:
    schema: dwschema
    if_exists: replace

Sample database credentials entry in credentials.yml:

db_credentials:
  con: postgresql://scott:tiger@localhost/test
  pool_size: 10 # additional parameters

Example usage for the Python API:

from kedro_datasets.pandas import SQLTableDataset
import pandas as pd

data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
table_name = "table_a"
credentials = {"con": f"sqlite:///{tmp_path / 'test.db'}"}
dataset = SQLTableDataset(table_name=table_name, credentials=credentials)

dataset.save(data)
reloaded = dataset.load()

assert data.equals(reloaded)

Attributes

DEFAULT_LOAD_ARGS

DEFAULT_SAVE_ARGS

engine

The Engine object for the dataset's connection string.

engines

Methods

create_connection(connection_str[, ...])

Given a connection string, create singleton connection to be used across all instances of SQLTableDataset that need to connect to the same source.

exists()

Checks whether a data set's output already exists by calling the provided _exists() method.

from_config(name, config[, load_version, ...])

Create a data set instance using the configuration provided.

load()

Loads data by delegation to the provided load method.

preview([nrows])

Generate a preview of the dataset with a specified number of rows.

release()

Release any cached data.

save(data)

Saves data by delegation to the provided save method.

DEFAULT_LOAD_ARGS: dict[str, Any] = {}
DEFAULT_SAVE_ARGS: dict[str, Any] = {'index': False}
__init__(*, table_name, credentials, load_args=None, save_args=None, metadata=None)[source]

Creates a new SQLTableDataset.

Parameters:
Raises:

DatasetError – When either table_name or con is empty.

classmethod create_connection(connection_str, connection_args=None)[source]

Given a connection string, create singleton connection to be used across all instances of SQLTableDataset that need to connect to the same source.

Return type:

None

property engine

The Engine object for the dataset’s connection string.

engines: dict[str, Any] = {}
exists()

Checks whether a data set’s output already exists by calling the provided _exists() method.

Return type:

bool

Returns:

Flag indicating whether the output already exists.

Raises:

DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)

Create a data set instance using the configuration provided.

Parameters:
  • name (str) – Data set name.

  • config (dict[str, Any]) – Data set config dictionary.

  • load_version (str | None) – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

  • save_version (str | None) – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

Return type:

AbstractDataset

Returns:

An instance of an AbstractDataset subclass.

Raises:

DatasetError – When the function fails to create the data set from its config.

load()

Loads data by delegation to the provided load method.

Return type:

TypeVar(_DO)

Returns:

Data returned by the provided load method.

Raises:

DatasetError – When underlying load method raises error.

preview(nrows=5)[source]

Generate a preview of the dataset with a specified number of rows.

Parameters:

nrows (int) – The number of rows to include in the preview. Defaults to 5.

Returns:

A dictionary containing the data in a split format.

Return type:

dict

release()

Release any cached data.

Raises:

DatasetError – when underlying release method raises error.

Return type:

None

save(data)

Saves data by delegation to the provided save method.

Parameters:

data (TypeVar(_DI)) – the value to be saved by provided save method.

Raises:
  • DatasetError – when underlying save method raises error.

  • FileNotFoundError – when save method got file instead of dir, on Windows.

  • NotADirectoryError – when save method got file instead of dir, on Unix.

Return type:

None