langfuse.EvaluationDataset
kedro_datasets_experimental.langfuse.EvaluationDataset ¶
EvaluationDataset(
dataset_name,
credentials,
filepath=None,
sync_policy="local",
metadata=None,
version=None,
)
Bases: AbstractDataset[list[dict[str, Any]], 'DatasetClient']
Kedro dataset for Langfuse evaluation datasets.
Connects to a Langfuse evaluation dataset and returns a DatasetClient
on load(), which can be used to run experiments via
dataset.run_experiment(). Supports an optional local JSON/YAML file
as the authoring surface for evaluation items.
On load / save behaviour:
- On load: Creates the remote dataset if it does not exist,
synchronises based on
sync_policy, and returns aDatasetClient. - On save: Upserts all items to the remote dataset — items with an
existing
idare updated in place, new items are created. Inlocalmode, items are also merged into the local file (new items take precedence). Inremotemode, only the remote upsert occurs.
Item format:
Evaluation items, whether stored in the local filepath file or
passed as the data argument to save(), must be a list of dicts.
Each item accepts the same keys as
Langfuse.create_dataset_item():
input(required) — the evaluation input payload.id— stable identifier used for deduplication on sync and upload.expected_output— ground-truth value for scoring.metadata— arbitrary metadata dict attached to the item.source_trace_id— Langfuse trace ID to link the item to.source_observation_id— observation ID within the source trace.status—"ACTIVE"(default) or"ARCHIVED".
[
{
"id": "q1",
"input": {"text": "cancel my order"},
"expected_output": "cancel_order",
"metadata": {"source": "production"}
}
]
Items without an id cannot be deduplicated and will be re-uploaded
on every load() or save() call.
Sync policies:
- local (default): The local file is the source of truth. On
load(), all local items are upserted to remote (creating new items or updating existing ones matched byid). Items without anidfield cannot be deduplicated and will create new entries on every load. - remote: The remote Langfuse dataset is the sole source of truth.
load()fetches the remote dataset as-is with no local file interaction.save()upserts all items to remote but does not write to any local file. An optionalversion(ISO 8601 timestamp) can pinload()to a historical snapshot (requireslangfuse>=3.14.0).
Examples:
Using catalog YAML configuration:
# Local sync policy - local file seeds and syncs to remote
evaluation_dataset:
type: kedro_datasets_experimental.langfuse.EvaluationDataset
dataset_name: intent-detection-eval
filepath: data/evaluation/intent_items.json
sync_policy: local
credentials: langfuse_credentials
metadata:
project: intent-detection
# Remote sync policy - Langfuse is the source of truth
production_eval:
type: kedro_datasets_experimental.langfuse.EvaluationDataset
dataset_name: intent-detection-eval
sync_policy: remote
credentials: langfuse_credentials
# Pinned to a historical snapshot for reproducibility
eval_snapshot:
type: kedro_datasets_experimental.langfuse.EvaluationDataset
dataset_name: intent-detection-eval
sync_policy: remote
version: "2026-01-15T00:00:00Z"
credentials: langfuse_credentials
Using Python API:
from kedro_datasets_experimental.langfuse import EvaluationDataset
dataset = EvaluationDataset(
dataset_name="intent-detection-eval",
credentials={
"public_key": "pk_...",
"secret_key": "sk_...", # pragma: allowlist secret
},
filepath="data/evaluation/intent_items.json",
)
# Load returns a DatasetClient for running experiments
eval_dataset = dataset.load()
for item in eval_dataset.items:
print(item.input, item.expected_output)
# Save new evaluation items
dataset.save(
[
{"id": "q1", "input": {"text": "cancel order"}, "expected_output": "cancel"},
]
)
Parameters:
-
dataset_name(str) –Name of the evaluation dataset in Langfuse.
-
credentials(dict[str, str]) –Langfuse authentication credentials. Required:
public_key,secret_key. Optional:host(defaults to Langfuse cloud). -
filepath(str | None, default:None) –Path to a local JSON/YAML file for authoring evaluation items. Supports
.json,.yaml, and.ymlextensions. WhenNone, no local file interaction occurs. -
sync_policy(Literal['local', 'remote'], default:'local') –Controls the source of truth for reads and whether a local file is involved:
"local"(default) — all local items are upserted to remote onload();save()upserts to remote and merges into the local file (new data takes precedence)."remote"—load()fetches remote as-is;save()upserts to remote without local file interaction. -
metadata(dict[str, Any] | None, default:None) –Optional metadata dict passed to Langfuse when creating the remote dataset for the first time.
-
version(str | None, default:None) –ISO 8601 timestamp to pin
load()to a historical snapshot (e.g."2026-01-15T00:00:00Z"). Only valid withsync_policy="remote". When omitted, the latest dataset state is returned. Requireslangfuse>=3.14.0(dataset versioning was introduced in the Feb 2026 release).
Raises:
-
DatasetError–If credentials are missing or empty, sync_policy is invalid, filepath has an unsupported extension, or version is used with
sync_policy="local".
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | |
_client
instance-attribute
¶
_client = Langfuse(
public_key=credentials["public_key"],
secret_key=credentials["secret_key"],
host=get("host"),
)
_describe ¶
_describe()
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
488 489 490 491 492 493 494 495 | |
_exists ¶
_exists()
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
477 478 479 480 481 482 483 484 485 486 | |
_get_or_create_remote_dataset ¶
_get_or_create_remote_dataset()
Ensure the remote Langfuse dataset exists, creating it if not found.
Returns the latest DatasetClient.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | |
_load_local_items ¶
_load_local_items()
Load items from the local file, returning an empty list if unavailable.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
347 348 349 350 351 | |
_merge_items
staticmethod
¶
_merge_items(existing, new)
Merge new items into existing list, deduplicating by 'id'.
Items without an id key are always appended.
For items with an id, new items take precedence — existing
entries with the same id are replaced.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
_parse_version
staticmethod
¶
_parse_version(version)
Parse an ISO 8601 version string into a timezone-aware UTC datetime.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
_sync_local_to_remote ¶
_sync_local_to_remote(dataset, local_items)
Upsert local items to remote (create new, update existing).
Every item is sent to Langfuse.create_dataset_item(), which
performs an upsert: items with an id that already exists on
remote are updated in place; new items are created. Items without
an id always create new entries and cannot be deduplicated.
Returns the refreshed DatasetClient.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | |
_upload_items ¶
_upload_items(items)
Upload items to the remote Langfuse dataset.
Passes through all keys accepted by Langfuse.create_dataset_item().
Callers are responsible for validating items before calling this method.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 | |
_validate_init_params
staticmethod
¶
_validate_init_params(
credentials, filepath, sync_policy, version
)
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
_validate_items
staticmethod
¶
_validate_items(items)
Validate that all items contain the required 'input' key.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
280 281 282 283 284 285 286 287 | |
load ¶
load()
Load the evaluation dataset from Langfuse.
Creates the remote dataset if it does not exist. In local mode,
all local items are upserted to remote (creating new items or
updating existing ones matched by id). In remote mode with
version set, returns items as they existed at that point in time.
Returns:
-
DatasetClient–Langfuse dataset client that can be used to iterate items or call
run_experiment().
Raises:
-
DatasetError–If the Langfuse API is unreachable or returns an unexpected error.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
preview ¶
preview()
Generate a JSON-compatible preview of the local evaluation data.
Returns:
-
JSONPreview–Serialised JSON string for Kedro-Viz. Returns a descriptive message if
filepathis not configured or the file does not exist.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
497 498 499 500 501 502 503 504 505 506 507 508 | |
save ¶
save(data)
Save evaluation items to the remote dataset.
Upserts all items to Langfuse via create_dataset_item() — items
with an existing id are updated in place, new items are created.
In local mode, items are also merged into the local file (new
items take precedence over existing entries with the same id).
In remote mode, only the remote upload occurs.
Parameters:
-
data(list[dict[str, Any]]) –List of evaluation item dicts. Each item must contain an
inputkey. See class docstring for the full list of accepted keys (mirrorsLangfuse.create_dataset_item()).
Raises:
-
DatasetError–If any item is missing the required
inputkey or the Langfuse API returns an error.
Source code in kedro_datasets_experimental/langfuse/evaluation_dataset.py
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 | |