FileDataset¶
FileDataset is used to load and save data to files using the Ibis framework.
kedro_datasets.ibis.FileDataset ¶
FileDataset(
filepath,
file_format="parquet",
*,
table_name=None,
connection=None,
load_args=None,
save_args=None,
version=None,
metadata=None
)
Bases: ConnectionMixin, AbstractVersionedDataset[Table, Table]
FileDataset loads/saves data from/to a specified file format.
Examples:
Using the YAML API:
cars:
type: ibis.FileDataset
filepath: data/01_raw/company/cars.csv
file_format: csv
table_name: cars
connection:
backend: duckdb
database: company.db
load_args:
sep: ","
nullstr: "#NA"
save_args:
sep: ","
nullstr: "#NA"
motorbikes:
type: ibis.FileDataset
filepath: s3://your_bucket/data/02_intermediate/company/motorbikes/
file_format: delta
table_name: motorbikes
connection:
backend: polars
Using the Python API:
>>> import ibis
>>> from kedro_datasets.ibis import FileDataset
>>>
>>> data = ibis.memtable({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
>>>
>>> dataset = FileDataset(
... filepath=tmp_path / "test.csv",
... file_format="csv",
... table_name="test",
... connection={"backend": "duckdb", "database": tmp_path / "file.db"},
... )
>>> dataset.save(data)
>>> reloaded = dataset.load()
>>> assert data.execute().equals(reloaded.execute())
FileDataset connects to the Ibis backend object constructed
from the connection configuration. The backend key provided in
the config can be any of the supported backends <https://ibis- project.org/install>. The remaining dictionary entries will be
passed as arguments to the underlying connect() method (e.g.
ibis.duckdb.connect() <https://ibis-project.org/backends/duckdb #ibis.duckdb.connect>).
The read method corresponding to the given file_format (e.g.
read_csv() <https://ibis-project.org/backends/ duckdb#ibis.backends.duckdb.Backend.read_csv>_) is used to load
the file with the backend. Note that only the data is loaded; no
link to the underlying file exists past FileDataset.load().
Parameters:
-
filepath(str) –Path to a file to register as a table. Most useful for loading data into your data warehouse (for testing). On save, the backend exports data to the specified path.
-
file_format(str, default:'parquet') –String specifying the file format for the file. Defaults to writing execution results to a Parquet file.
-
table_name(str | None, default:None) –The name to use for the created table (on load).
-
connection(dict[str, Any] | None, default:None) –Configuration for connecting to an Ibis backend. If not provided, connect to DuckDB in in-memory mode.
-
load_args(dict[str, Any] | None, default:None) –Additional arguments passed to the Ibis backend's
read_{file_format}method. -
save_args(dict[str, Any] | None, default:None) –Additional arguments passed to the Ibis backend's
to_{file_format}method. -
version(Version | None, default:None) –If specified, should be an instance of
kedro.io.core.Version. If itsloadattribute is None, the latest version will be loaded. If itssaveattribute is None, save version will be autogenerated. -
metadata(dict[str, Any] | None, default:None) –Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
Source code in kedro_datasets/ibis/file_dataset.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
DEFAULT_CONNECTION_CONFIG
class-attribute
¶
DEFAULT_CONNECTION_CONFIG = {
"backend": "duckdb",
"database": ":memory:",
}
_connection_config
instance-attribute
¶
_connection_config = connection or DEFAULT_CONNECTION_CONFIG
_connect ¶
_connect()
Source code in kedro_datasets/ibis/file_dataset.py
144 145 146 147 148 149 | |
_describe ¶
_describe()
Source code in kedro_datasets/ibis/file_dataset.py
167 168 169 170 171 172 173 174 175 176 | |
_exists ¶
_exists()
Source code in kedro_datasets/ibis/file_dataset.py
178 179 180 181 182 183 184 | |
load ¶
load()
Source code in kedro_datasets/ibis/file_dataset.py
156 157 158 159 | |
save ¶
save(data)
Source code in kedro_datasets/ibis/file_dataset.py
161 162 163 164 165 | |