opik.TraceDataset
kedro_datasets_experimental.opik.TraceDataset ¶
TraceDataset(credentials, mode='sdk', **trace_kwargs)
Bases: AbstractDataset
Kedro dataset for managing Opik tracing clients and callbacks.
This dataset provides Opik tracing integrations for various AI frameworks or direct SDK usage. During initialization, the dataset automatically configures the Opik environment and credentials to ensure that subsequent traces are correctly logged to the specified workspace and project.
Modes:
sdk: Returns a simple namespace-like client exposing thetrackdecorator for manual tracing.openai: Returns an OpenAI client automatically wrapped for Opik tracing.langchain: Returns anOpikTracercallback handler for LangChain integration.autogen: Returns a configuredTracerfor AutoGen integration via OTLP (OpenTelemetry Protocol).
Examples
Using catalog YAML configuration:
opik_trace:
type: kedro_datasets_experimental.opik.TraceDataset
credentials: opik_credentials
mode: openai
Using Python API:
from kedro_datasets_experimental.opik import TraceDataset
# Example: OpenAI mode (traced completions)
dataset = TraceDataset(
credentials={
"api_key": "opik_api_key", # pragma: allowlist secret
"workspace": "my-workspace",
"project_name": "kedro-demo",
"openai": {
"api_key": "sk-...", # pragma: allowlist secret
"base_url": "https://api.openai.com/v1",
},
},
mode="openai",
)
client = dataset.load()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize Kedro in one sentence."},
],
)
# Example: SDK mode (manual tracing via decorator)
dataset = TraceDataset(
credentials={
"api_key": "opik_api_key", # pragma: allowlist secret
"workspace": "my-workspace",
"project_name": "kedro-sdk-demo",
},
mode="sdk",
)
client = dataset.load()
@client.track(name="demo_workflow")
def multiply(x: int, y: int) -> int:
return x * y
print(multiply(3, 4))
# Example: LangChain mode
dataset = TraceDataset(
credentials={
"api_key": "opik_api_key", # pragma: allowlist secret
"workspace": "my-workspace",
},
mode="langchain",
)
tracer = dataset.load()
# Use tracer in your LangChain Runnable or chain.run(callbacks=[tracer])
# Example: AutoGen mode Opik cloud
dataset = TraceDataset(
credentials={
"api_key": "opik_api_key", # pragma: allowlist secret
"workspace": "my-workspace",
"project_name": "autogen-demo",
"endpoint": "https://www.comet.com/opik/api/v1/private/otel/v1/traces",
},
mode="autogen",
)
tracer = dataset.load() # Returns configured Tracer, ready to use
# Option 1: Automatic tracing (LLM calls traced automatically)
agent.invoke(context) # Traces sent to Opik
# Option 2: Add custom spans with business context (recommended)
with tracer.start_as_current_span("response_generation") as span:
span.set_attribute("intent", "claim_new")
span.set_attribute("user_id", "123")
agent.invoke(context) # Child spans nested under "response_generation"
# Example: AutoGen mode self-hosted
dataset = TraceDataset(
credentials={
"api_key": "opik_api_key", # pragma: allowlist secret
"workspace": "my-workspace",
"project_name": "autogen-demo",
"url_override": "http://localhost:5173",
"endpoint": "http://localhost:5173/opik/api/v1/private/otel/v1/traces",
},
mode="autogen",
)
tracer = dataset.load()
Notes
- Opik configuration is global within the Python process.
Using multiple
TraceDatasetinstances with different projects in the same session may cause all traces to log to the first configured project. - To switch projects, restart the Python process or reload the Opik module.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
_build_autogen_tracer ¶
_build_autogen_tracer()
Build and return a configured Tracer for AutoGen integration with Opik.
Sets up OpenTelemetry TracerProvider with OTLP exporter to Opik, configures it as the global provider, and returns a ready-to-use Tracer.
Returns:
-
Any–Tracer configured to export traces to Opik.
Raises:
-
DatasetError–If required OpenTelemetry dependencies are not installed.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 | |
_configure_opik ¶
_configure_opik()
Configure the Opik SDK from the provided credentials.
project_name is passed to configure() so it is persisted to
Opik's session configuration and picked up by the auto-created client.
This is what routes traces to the right project for both langchain
mode (the OpikTracer inherits the configured project) and sdk
mode (the @track decorator resolves the same way). configure()
gained the project_name parameter in opik 1.11.0.
Note: Opik configuration is global within a process, so the project
cannot be changed once a client has been created. Using multiple
TraceDataset instances with different projects in the same session
will log all traces to the first configured project; a warning is
emitted in that case.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
_describe ¶
_describe()
Describe dataset configuration with credentials redacted.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
300 301 302 303 304 305 306 307 308 309 | |
_load_langchain_tracer ¶
_load_langchain_tracer()
Return an OpikTracer callback for LangChain integration.
The project is set by _configure_opik (via configure), so the
tracer inherits it from the configured client. An explicit
project_name catalog kwarg still flows through trace_kwargs and
takes precedence for this tracer.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
369 370 371 372 373 374 375 376 377 378 379 380 381 382 | |
_load_openai_client ¶
_load_openai_client()
Return an OpenAI client wrapped with Opik tracing integration.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 | |
_load_sdk_client ¶
_load_sdk_client()
Return a simple SDK client exposing the track decorator.
The Opik SDK does not provide a formal client object for direct usage;
instead, the track decorator is imported at the module level.
This wrapper mimics a client interface for consistency across modes.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
329 330 331 332 333 334 335 336 337 338 339 340 341 | |
_validate_openai_client_params ¶
_validate_openai_client_params()
Validate OpenAI credentials in the 'openai' section.
Raises:
-
DatasetError–If OpenAI credentials are missing or invalid.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
_validate_opik_credentials ¶
_validate_opik_credentials()
Validate Opik credentials before configuring the environment.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
162 163 164 165 166 167 168 169 170 171 172 173 | |
load ¶
load()
Load the appropriate tracing client based on the configured mode.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
save ¶
save(data)
Saving traces manually is not supported; TraceDataset is read-only.
Source code in kedro_datasets_experimental/opik/trace_dataset.py
384 385 386 | |