PptxDataset¶
PptxDataset loads and saves data in Pptx format.
kedro_datasets.openxml.PptxDataset ¶
PptxDataset(
*,
filepath,
version=None,
credentials=None,
fs_args=None,
metadata=None
)
Bases: AbstractVersionedDataset[Presentation, Presentation]
PptxDataset loads/saves data from/to a .pptx file using an underlying
filesystem (e.g.: local, S3, GCS). It uses python-pptx from Presentation to handle the .pptx file.
Examples:
Using the YAML API:
presentation:
type: openxml.PptxDataset
filepath: slides.pptx
Using the Python API:
>>> from pptx import Presentation
>>> from kedro_datasets.openxml import PptxDataset
>>>
>>> data = Presentation()
>>> slide = data.slides.add_slide(data.slide_layouts[0])
>>> title = slide.shapes.title
>>> title.text = "Hello, World!"
>>>
>>> dataset = PptxDataset(filepath=tmp_path / "test.pptx")
>>> dataset.save(data)
>>> reloaded = dataset.load()
>>> assert reloaded.slides[0].shapes.title.text == "Hello, World!"
Parameters:
-
filepath(str | PathLike) –Filepath in POSIX format to a .pptx file prefixed with a protocol like
s3://. If prefix is not provided,fileprotocol (local filesystem) will be used. The prefix should be any protocol supported byfsspec. Note:http(s)doesn't support versioning. -
version(Version | None, default:None) –If specified, should be an instance of
kedro.io.core.Version. If itsloadattribute is None, the latest version will be loaded. If itssaveattribute is None, save version will be autogenerated. -
credentials(dict[str, Any] | None, default:None) –Credentials required to get access to the underlying filesystem. E.g. for
GCSFileSystemit should look like{"token": None}. -
fs_args(dict[str, Any] | None, default:None) –Extra arguments to pass into underlying filesystem class constructor (e.g.
{"project": "my-project"}forGCSFileSystem), as well as to pass to the filesystem'sopenmethod through nested keysopen_args_loadandopen_args_save. Here you can find all available arguments foropen: https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open All defaults are preserved, exceptmode, which is set towbwhen saving. -
metadata(dict[str, Any] | None, default:None) –Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
Source code in kedro_datasets/openxml/pptx_dataset.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
DEFAULT_FS_ARGS
class-attribute
instance-attribute
¶
DEFAULT_FS_ARGS = {'open_args_save': {'mode': 'wb'}}
_fs_open_args_load
instance-attribute
¶
_fs_open_args_load = {
None: get("open_args_load", {}),
None: _fs_open_args_load or {},
}
_fs_open_args_save
instance-attribute
¶
_fs_open_args_save = {
None: get("open_args_save", {}),
None: _fs_open_args_save or {},
}
_describe ¶
_describe()
Returns a dictionary with basic dataset information.
Returns:
-
dict[str, Any]–A dictionary with the following keys: - "filepath" (PurePosixPath): Path to the
.pptxfile. - "protocol" (str): Filesystem protocol (e.g., 'file', 's3'). - "version" (Version | None): Version information if specified.
Source code in kedro_datasets/openxml/pptx_dataset.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
_exists ¶
_exists()
Checks whether the file exists on the filesystem.
Returns:
-
bool–True if the file exists, otherwise False.
Source code in kedro_datasets/openxml/pptx_dataset.py
159 160 161 162 163 164 165 166 167 168 169 170 | |
_invalidate_cache ¶
_invalidate_cache()
Invalidate underlying filesystem caches.
Source code in kedro_datasets/openxml/pptx_dataset.py
177 178 179 180 | |
_release ¶
_release()
Releases resources and invalidates the filesystem cache.
Source code in kedro_datasets/openxml/pptx_dataset.py
172 173 174 175 | |
load ¶
load()
Loads a .pptx file from the filesystem.
Returns:
-
Presentation–A
python-pptxPresentation instance containing the loaded content.
Source code in kedro_datasets/openxml/pptx_dataset.py
134 135 136 137 138 139 140 141 142 | |
save ¶
save(data)
Saves a Presentation object to the filesystem.
Parameters:
-
data(Presentation) –A
python-pptxPresentation instance to be saved.
Source code in kedro_datasets/openxml/pptx_dataset.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |