Skip to content

langchain.OpenAIEmbeddingsDataset

kedro_datasets.langchain.OpenAIEmbeddingsDataset

OpenAIEmbeddingsDataset(credentials={}, kwargs={})

Bases: AbstractDataset[None, OpenAIEmbeddings]

OpenAIEmbeddingsDataset loads an OpenAIEmbeddings langchain model.

Example usage for the YAML API

catalog.yml

text_embedding_ada_002:
    type: langchain.OpenAIEmbeddingsDataset
    kwargs:
        model: "text-embedding-ada-002"
    credentials: openai  # Optional, can use environment variables instead

credentials.yml (optional if using environment variables) If credentials are passed through credentials.yml, they take precedence over environment variables.

openai:
    base_url: <openai-api-base>  # Optional, defaults to OpenAI default
    api_key: <openai-api-key>   # Optional if OPENAI_API_KEY is set

Or use environment variables:

export OPENAI_API_KEY=<your-api-key>
export OPENAI_API_BASE=<openai-api-base>  # Optional

Example usage for the Python API
from kedro_datasets.langchain import OpenAIEmbeddingsDataset

# With explicit credentials
embeddings = OpenAIEmbeddingsDataset(
    credentials={
        "base_url": "<openai-api-base>",
        "api_key": "<openai-api-key>",
    },
    kwargs={
        "model": "text-embedding-ada-002",
    },
).load()

# Or without credentials (using environment variables)
embeddings = OpenAIEmbeddingsDataset(
    kwargs={
        "model": "text-embedding-ada-002",
    },
).load()

# See: https://python.langchain.com/docs/integrations/text_embedding/openai
embeddings.embed_query("Hello world!")

Parameters:

  • credentials (Optional, default: {} ) –

    contains api_key and base_url. If not provided, will use environment variables OPENAI_API_KEY and OPENAI_API_BASE.

  • kwargs (dict[str, Any], default: {} ) –

    keyword arguments passed to the underlying constructor.

Source code in kedro_datasets/langchain/openai_embeddings_dataset.py
67
68
69
70
71
72
73
74
75
76
def __init__(self, credentials: dict[str, str] = {}, kwargs: dict[str, Any] = {}):
    """Constructor.

    Args:
        credentials (Optional): contains `api_key` and `base_url`.
            If not provided, will use environment variables OPENAI_API_KEY and OPENAI_API_BASE.
        kwargs: keyword arguments passed to the underlying constructor.
    """
    self.credentials = credentials or {}
    self.kwargs = kwargs or {}

credentials instance-attribute

credentials = credentials or {}

kwargs instance-attribute

kwargs = kwargs or {}

_describe

_describe()

Returns a description of the dataset.

Returns:

  • dict[str, Any]

    Dictionary containing the kwargs passed to the OpenAI constructor.

Source code in kedro_datasets/langchain/openai_embeddings_dataset.py
78
79
80
81
82
83
84
85
86
87
def _describe(self) -> dict[str, Any]:
    """Returns a description of the dataset.

    Returns:
        dict[str, Any]: Dictionary containing the kwargs passed to the OpenAI constructor.
    """
    credentials = (
        {k: "***" for k in self.credentials.keys()} if self.credentials else {}
    )
    return {**credentials, **self.kwargs}

load

load()

Load and return an OpenAI model instance.

Constructs an OpenAI instance using the provided kwargs and optional credentials. If credentials are not provided, the OpenAI instance will automatically use environment variables OPENAI_API_KEY and OPENAI_API_BASE for authentication.

Returns:

  • OPENAI_TYPE

    A configured OpenAI model instance.

Source code in kedro_datasets/langchain/openai_embeddings_dataset.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
def load(self) -> OpenAIEmbeddings:
    """Load and return an OpenAI model instance.

    Constructs an OpenAI instance using the provided kwargs and optional
    credentials. If credentials are not provided, the OpenAI instance
    will automatically use environment variables OPENAI_API_KEY and
    OPENAI_API_BASE for authentication.

    Returns:
        OPENAI_TYPE: A configured OpenAI model instance.
    """
    return OpenAIEmbeddings(**self.credentials, **self.kwargs)  # type: ignore[arg-type]

save

save(data)

Save operation is not supported for OpenAI datasets.

Raises:

  • DatasetError

    Always raised as this dataset is read-only.

Source code in kedro_datasets/langchain/openai_embeddings_dataset.py
89
90
91
92
93
94
95
def save(self, data: None) -> NoReturn:
    """Save operation is not supported for OpenAI datasets.

    Raises:
        DatasetError: Always raised as this dataset is read-only.
    """
    raise DatasetError(f"{self.__class__.__name__} is a read only dataset type")