Skip to content

PlotlyDataset

PlotlyDataset manages Plotly visualizations, allowing them to be saved and loaded.

kedro_datasets.plotly.PlotlyDataset

PlotlyDataset(
    *,
    filepath,
    plotly_args,
    load_args=None,
    save_args=None,
    version=None,
    credentials=None,
    fs_args=None,
    metadata=None
)

Bases: JSONDataset

PlotlyDataset generates a plot from a pandas DataFrame and saves it to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It loads the JSON into a plotly figure.

PlotlyDataset is a convenience wrapper for plotly.JSONDataset. It generates the JSON file directly from a pandas DataFrame through plotly_args.

Examples:

Using the YAML API:

bar_plot:
  type: plotly.PlotlyDataset
  filepath: data/08_reporting/bar_plot.json
  plotly_args:
    type: bar
    fig:
      x: features
      y: importance
      orientation: h
    layout:
      xaxis_title: x
      yaxis_title: y
      title: Title

Using the Python API:

>>> import pandas as pd
>>> import plotly.express as px
>>> from kedro_datasets.plotly import PlotlyDataset
>>>
>>> df_data = pd.DataFrame([[0, 1], [1, 0]], columns=("x1", "x2"))
>>>
>>> dataset = PlotlyDataset(
...     filepath=tmp_path / "scatter_plot.json",
...     plotly_args={
...         "type": "scatter",
...         "fig": {"x": "x1", "y": "x2"},
...     },
... )
>>> dataset.save(df_data)
>>> reloaded = dataset.load()
>>> assert px.scatter(df_data, x="x1", y="x2") == reloaded

Parameters:

  • filepath (str) –

    Filepath in POSIX format to a JSON file prefixed with a protocol like s3://. If prefix is not provided file protocol (local filesystem) will be used. The prefix should be any protocol supported by fsspec. Note: http(s) doesn't support versioning.

  • plotly_args (dict[str, Any]) –

    Plotly configuration for generating a plotly figure from the dataframe. Keys are type (plotly express function, e.g. bar, line, scatter), fig (kwargs passed to the plotting function), theme (defaults to plotly), layout.

  • load_args (dict[str, Any] | None, default: None ) –

    Plotly options for loading JSON files. Here you can find all available arguments: https://plotly.com/python-api-reference/generated/plotly.io.from_json.html#plotly.io.from_json All defaults are preserved.

  • save_args (dict[str, Any] | None, default: None ) –

    Plotly options for saving JSON files. Here you can find all available arguments: https://plotly.com/python-api-reference/generated/plotly.io.write_json.html All defaults are preserved.

  • version (Version | None, default: None ) –

    If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated.

  • credentials (dict[str, Any] | None, default: None ) –

    Credentials required to get access to the underlying filesystem. E.g. for GCSFileSystem it should look like {'token': None}.

  • fs_args (dict[str, Any] | None, default: None ) –

    Extra arguments to pass into underlying filesystem class constructor (e.g. {"project": "my-project"} for GCSFileSystem), as well as to pass to the filesystem's open method through nested keys open_args_load and open_args_save. Here you can find all available arguments for open: https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open All defaults are preserved, except mode, which is set to w when saving.

  • metadata (dict[str, Any] | None, default: None ) –

    Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.

Source code in kedro_datasets/plotly/plotly_dataset.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def __init__(  # noqa: PLR0913
    self,
    *,
    filepath: str,
    plotly_args: dict[str, Any],
    load_args: dict[str, Any] | None = None,
    save_args: dict[str, Any] | None = None,
    version: Version | None = None,
    credentials: dict[str, Any] | None = None,
    fs_args: dict[str, Any] | None = None,
    metadata: dict[str, Any] | None = None,
) -> None:
    """Creates a new instance of ``PlotlyDataset`` pointing to a concrete JSON file
    on a specific filesystem.

    Args:
        filepath: Filepath in POSIX format to a JSON file prefixed with a protocol like `s3://`.
            If prefix is not provided `file` protocol (local filesystem) will be used.
            The prefix should be any protocol supported by ``fsspec``.
            Note: `http(s)` doesn't support versioning.
        plotly_args: Plotly configuration for generating a plotly figure from the
            dataframe. Keys are `type` (plotly express function, e.g. bar,
            line, scatter), `fig` (kwargs passed to the plotting function), theme
            (defaults to `plotly`), `layout`.
        load_args: Plotly options for loading JSON files.
            Here you can find all available arguments:
            https://plotly.com/python-api-reference/generated/plotly.io.from_json.html#plotly.io.from_json
            All defaults are preserved.
        save_args: Plotly options for saving JSON files.
            Here you can find all available arguments:
            https://plotly.com/python-api-reference/generated/plotly.io.write_json.html
            All defaults are preserved.
        version: If specified, should be an instance of
            ``kedro.io.core.Version``. If its ``load`` attribute is
            None, the latest version will be loaded. If its ``save``
            attribute is None, save version will be autogenerated.
        credentials: Credentials required to get access to the underlying filesystem.
            E.g. for ``GCSFileSystem`` it should look like `{'token': None}`.
        fs_args: Extra arguments to pass into underlying filesystem class constructor
            (e.g. `{"project": "my-project"}` for ``GCSFileSystem``), as well as
            to pass to the filesystem's `open` method through nested keys
            `open_args_load` and `open_args_save`.
            Here you can find all available arguments for `open`:
            https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.spec.AbstractFileSystem.open
            All defaults are preserved, except `mode`, which is set to `w` when saving.
        metadata: Any arbitrary metadata.
            This is ignored by Kedro, but may be consumed by users or external plugins.
    """
    super().__init__(
        filepath=filepath,
        load_args=load_args,
        save_args=save_args,
        version=version,
        credentials=credentials,
        fs_args=fs_args,
    )
    self._plotly_args = plotly_args

    _fs_args = deepcopy(fs_args) or {}
    _fs_open_args_load = _fs_args.pop("open_args_load", {})
    _fs_open_args_save = _fs_args.pop("open_args_save", {})

    # Handle default fs arguments
    self._fs_open_args_load = {
        **self.DEFAULT_FS_ARGS.get("open_args_load", {}),
        **(_fs_open_args_load or {}),
    }
    self._fs_open_args_save = {
        **self.DEFAULT_FS_ARGS.get("open_args_save", {}),
        **(_fs_open_args_save or {}),
    }

    self.metadata = metadata

DEFAULT_FS_ARGS class-attribute instance-attribute

DEFAULT_FS_ARGS = {'open_args_save': {'mode': 'w'}}

_fs_open_args_load instance-attribute

_fs_open_args_load = {
    None: get("open_args_load", {}),
    None: _fs_open_args_load or {},
}

_fs_open_args_save instance-attribute

_fs_open_args_save = {
    None: get("open_args_save", {}),
    None: _fs_open_args_save or {},
}

_plotly_args instance-attribute

_plotly_args = plotly_args

metadata instance-attribute

metadata = metadata

_describe

_describe()
Source code in kedro_datasets/plotly/plotly_dataset.py
144
145
def _describe(self) -> dict[str, Any]:
    return {**super()._describe(), "plotly_args": self._plotly_args}

_plot_dataframe

_plot_dataframe(data)
Source code in kedro_datasets/plotly/plotly_dataset.py
151
152
153
154
155
156
157
def _plot_dataframe(self, data: pd.DataFrame) -> go.Figure:
    plot_type = self._plotly_args.get("type")
    fig_params = self._plotly_args.get("fig", {})
    fig = getattr(px, plot_type)(data, **fig_params)  # type: ignore
    fig.update_layout(template=self._plotly_args.get("theme", "plotly"))
    fig.update_layout(self._plotly_args.get("layout", {}))
    return fig

preview

preview()

Generates a preview of the plotly dataset.

Returns:

  • dict

    A dictionary containing the plotly data.

Source code in kedro_datasets/plotly/plotly_dataset.py
159
160
161
162
163
164
165
166
167
168
def preview(self) -> PlotlyPreview:
    """
    Generates a preview of the plotly dataset.

    Returns:
        dict: A dictionary containing the plotly data.
    """
    load_path = get_filepath_str(self._get_load_path(), self._protocol)
    with self._fs.open(load_path, **self._fs_open_args_load) as fs_file:
        return json.load(fs_file)

save

save(data)
Source code in kedro_datasets/plotly/plotly_dataset.py
147
148
149
def save(self, data: pd.DataFrame) -> None:
    fig = self._plot_dataframe(data)
    super().save.__wrapped__(self, fig)  # type: ignore[attr-defined]