FileDataset¶
FileDataset is used to load and save data to files using the Ibis framework.
kedro_datasets.ibis.FileDataset ¶
FileDataset(
filepath,
file_format="parquet",
*,
table_name=None,
connection=None,
credentials=None,
load_args=None,
save_args=None,
version=None,
metadata=None
)
Bases: ConnectionMixin, AbstractVersionedDataset[Table, Table]
FileDataset loads/saves data from/to a specified file format.
Examples:
Using the YAML API:
cars:
type: ibis.FileDataset
filepath: data/01_raw/company/cars.csv
file_format: csv
table_name: cars
connection:
backend: duckdb
database: company.db
load_args:
sep: ","
nullstr: "#NA"
save_args:
sep: ","
nullstr: "#NA"
motorbikes:
type: ibis.FileDataset
filepath: s3://your_bucket/data/02_intermediate/company/motorbikes/
file_format: delta
table_name: motorbikes
connection:
backend: polars
Using the Python API:
>>> import ibis
>>> from kedro_datasets.ibis import FileDataset
>>>
>>> data = ibis.memtable({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
>>>
>>> dataset = FileDataset(
... filepath=tmp_path / "test.csv",
... file_format="csv",
... table_name="test",
... connection={"backend": "duckdb", "database": tmp_path / "file.db"},
... )
>>> dataset.save(data)
>>> reloaded = dataset.load()
>>> assert data.execute().equals(reloaded.execute())
FileDataset connects to the Ibis backend object constructed
from the connection configuration. The backend key provided in
the config can be any of the
supported backends. The
remaining dictionary entries will be passed as arguments to the
underlying connect() method (e.g.
ibis.duckdb.connect()).
The read method corresponding to the given file_format (e.g.
read_csv())
is used to load
the file with the backend. Note that only the data is loaded; no
link to the underlying file exists past FileDataset.load().
Parameters:
-
filepath(str) –Path to a file to register as a table. Most useful for loading data into your data warehouse (for testing). On save, the backend exports data to the specified path.
-
file_format(str, default:'parquet') –String specifying the file format for the file. Defaults to writing execution results to a Parquet file.
-
table_name(str | None, default:None) –The name to use for the created table (on load).
-
connection(dict[str, Any] | None, default:None) –Configuration for connecting to an Ibis backend. If not provided, connect to DuckDB in in-memory mode.
-
credentials(dict[str, Any] | None, default:None) –Credentials or additional configuration used to connect (e.g. user, password, token, account). If given, these values override the base connection configuration.
-
load_args(dict[str, Any] | None, default:None) –Additional arguments passed to the Ibis backend's
read_{file_format}method. -
save_args(dict[str, Any] | None, default:None) –Additional arguments passed to the Ibis backend's
to_{file_format}method. -
version(Version | None, default:None) –If specified, should be an instance of
kedro.io.core.Version. If itsloadattribute is None, the latest version will be loaded. If itssaveattribute is None, save version will be autogenerated. -
metadata(dict[str, Any] | None, default:None) –Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
Source code in kedro_datasets/ibis/file_dataset.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
DEFAULT_CONNECTION_CONFIG
class-attribute
¶
DEFAULT_CONNECTION_CONFIG = {
"backend": "duckdb",
"database": ":memory:",
}
_connection_config
instance-attribute
¶
_connection_config = {
None: _connection_config,
None: _credentials,
}
_connect ¶
_connect()
Source code in kedro_datasets/ibis/file_dataset.py
150 151 152 153 154 155 | |
_describe ¶
_describe()
Source code in kedro_datasets/ibis/file_dataset.py
173 174 175 176 177 178 179 180 181 182 | |
_exists ¶
_exists()
Source code in kedro_datasets/ibis/file_dataset.py
184 185 186 187 188 189 190 | |
load ¶
load()
Source code in kedro_datasets/ibis/file_dataset.py
162 163 164 165 | |
save ¶
save(data)
Source code in kedro_datasets/ibis/file_dataset.py
167 168 169 170 171 | |