GBQTableDataset¶
GBQTableDataset loads and saves data to/from Google BigQuery tables using pandas-gbq.
kedro_datasets.pandas.GBQTableDataset ¶
GBQTableDataset(
*,
dataset,
table_name,
project=None,
credentials=None,
load_args=None,
save_args=None,
metadata=None
)
Bases: ConnectionMixin, AbstractDataset[None, DataFrame]
GBQTableDataset loads and saves data from/to Google BigQuery.
It uses pandas-gbq to read and write from/to BigQuery table.
Examples:
Using the YAML API:
vehicles:
type: pandas.GBQTableDataset
dataset: big_query_dataset
table_name: big_query_table
project: my-project
credentials: gbq-creds
load_args:
reauth: True
save_args:
chunk_size: 100
Using the Python API:
>>> import pandas as pd
>>> from kedro_datasets.pandas import GBQTableDataset
>>>
>>> data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
>>>
>>> dataset = GBQTableDataset(
... dataset="dataset", table_name="table_name", project="my-project"
>>> )
>>> dataset.save(data)
>>> reloaded = dataset.load()
>>> assert data.equals(reloaded)
Parameters:
-
dataset(str) –Google BigQuery dataset.
-
table_name(str) –Google BigQuery table name.
-
project(str | None, default:None) –Google BigQuery Account project ID. Optional when available from the environment. https://cloud.google.com/resource-manager/docs/creating-managing-projects
-
credentials(dict[str, Any] | str | Credentials | None, default:None) –Credentials for accessing Google APIs. Either a credential that bases on
google.auth.credentials.CredentialsOR a service account json as a dictionary OR a path to a service account key json file. https://googleapis.dev/python/google-auth/latest/ -
load_args(dict[str, Any] | None, default:None) –Pandas options for loading BigQuery table into DataFrame. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html All defaults are preserved.
-
save_args(dict[str, Any] | None, default:None) –Pandas options for saving DataFrame to BigQuery table. Here you can find all available arguments: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_gbq.html All defaults are preserved, but "progress_bar", which is set to False.
-
metadata(dict[str, Any] | None, default:None) –Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
Raises:
-
DatasetError–When
load_args['location']andsave_args['location']are different.
Source code in kedro_datasets/pandas/gbq_dataset.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
_connection_config
instance-attribute
¶
_connection_config = {
"project": _project_id,
"credentials": credentials,
"location": get("location"),
}
_connect ¶
_connect()
Source code in kedro_datasets/pandas/gbq_dataset.py
148 149 150 151 152 153 | |
_describe ¶
_describe()
Source code in kedro_datasets/pandas/gbq_dataset.py
140 141 142 143 144 145 146 | |
_exists ¶
_exists()
Source code in kedro_datasets/pandas/gbq_dataset.py
173 174 175 176 177 178 179 | |
_validate_location ¶
_validate_location()
Source code in kedro_datasets/pandas/gbq_dataset.py
181 182 183 184 185 186 187 188 189 190 191 | |
load ¶
load()
Source code in kedro_datasets/pandas/gbq_dataset.py
155 156 157 158 159 160 161 162 | |
save ¶
save(data)
Source code in kedro_datasets/pandas/gbq_dataset.py
164 165 166 167 168 169 170 171 | |