kedro_datasets.pandas.GBQTableDataset¶

class kedro_datasets.pandas.GBQTableDataset(*, dataset, table_name, project=None, credentials=None, load_args=None, save_args=None, metadata=None)[source]¶

GBQTableDataset loads and saves data from/to Google BigQuery. It uses pandas-gbq to read and write from/to BigQuery table.

Example usage for the YAML API:

vehicles:
  type: pandas.GBQTableDataset
  dataset: big_query_dataset
  table_name: big_query_table
  project: my-project
  credentials: gbq-creds
  load_args:
    reauth: True
  save_args:
    chunk_size: 100

Example usage for the Python API:

 from kedro_datasets.pandas import GBQTableDataset
 import pandas as pd

 data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})

 dataset = GBQTableDataset(
...     dataset="dataset", table_name="table_name", project="my-project"
... )
 dataset.save(data)
 reloaded = dataset.load()

 assert data.equals(reloaded)

Attributes

`DEFAULT_LOAD_ARGS`
`DEFAULT_SAVE_ARGS`

Methods

`exists`()	Checks whether a data set's output already exists by calling the provided _exists() method.
`from_config`(name, config[, load_version, ...])	Create a data set instance using the configuration provided.
`load`()	Loads data by delegation to the provided load method.
`release`()	Release any cached data.
`save`(data)	Saves data by delegation to the provided save method.

DEFAULT_LOAD_ARGS: dict[str, Any] = {}¶

DEFAULT_SAVE_ARGS: dict[str, Any] = {'progress_bar': False}¶

__init__(*, dataset, table_name, project=None, credentials=None, load_args=None, save_args=None, metadata=None)[source]¶

Creates a new instance of GBQTableDataset.

Parameters:

dataset (str) – Google BigQuery dataset.
table_name (str) – Google BigQuery table name.
project (Optional[str]) – Google BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects
credentials (Union[dict[str, Any], Credentials, None]) – Credentials for accessing Google APIs. Either google.auth.credentials.Credentials object or dictionary with parameters required to instantiate google.oauth2.credentials.Credentials. Here you can find all the arguments: https://google-auth.readthedocs.io/en/latest/reference/google.oauth2.credentials.html
load_args (Optional[dict[str, Any]]) – Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved.
save_args (Optional[dict[str, Any]]) – Pandas options for saving DataFrame to BigQuery table. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_gbq.html All defaults are preserved, but “progress_bar”, which is set to False.
metadata (Optional[dict[str, Any]]) – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.

Raises:

DatasetError – When load_args['location'] and save_args['location'] are different.

exists()¶

Checks whether a data set’s output already exists by calling the provided _exists() method.

Return type:: bool
Returns:: Flag indicating whether the output already exists.
Raises:: DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)¶

Create a data set instance using the configuration provided.

Parameters:

name (str) – Data set name.
config (dict[str, Any]) – Data set config dictionary.
load_version (str | None) – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
save_version (str | None) – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

Return type:

AbstractDataset

Returns:

An instance of an AbstractDataset subclass.

Raises:

DatasetError – When the function fails to create the data set from its config.

load()¶

Loads data by delegation to the provided load method.

Return type:: TypeVar(_DO)
Returns:: Data returned by the provided load method.
Raises:: DatasetError – When underlying load method raises error.

release()¶

Release any cached data.

Raises:: DatasetError – when underlying release method raises error.
Return type:: None

save(data)¶

Saves data by delegation to the provided save method.

Parameters:

data (TypeVar(_DI)) – the value to be saved by provided save method.

Raises:

DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.

Return type:

None