fix(module): export EvaluationModuleError and wrap _compute failures by xodn348 · Pull Request #759 · huggingface/evaluate

xodn348 · 2026-05-14T07:31:22Z

Summary

EvaluationModuleError was absent from evaluate's public API, so callers had
no way to catch evaluate-specific failures without catching the broad Exception
base class or importing internal sklearn/numpy error types. This made it
impossible to write except evaluate.EvaluationModuleError — doing so raised an
AttributeError because the symbol was never exported.

This PR adds the EvaluationModuleError class to src/evaluate/module.py and
exports it from src/evaluate/__init__.py. It also wraps the _compute() call
inside EvaluationModule.compute() so that raw backend exceptions (e.g.
ValueError: y_true contains only one label from sklearn) surface as
EvaluationModuleError with a descriptive message, rather than leaking
implementation details to callers.

Issue

Fixes #758

Local verification

=== black check ===
All done! ✨ 🍰 ✨
2 files would be left unchanged.

=== isort check ===
(no output — no changes needed)

=== flake8 ===
(no output — no issues)

=== pytest tests/test_metric.py (22 passed, 2 skipped, 2 deselected) ===
PASSED test_concurrent_metrics
PASSED test_distributed_metrics
PASSED test_dummy_metric
PASSED test_dummy_metric_pickle
PASSED test_input_numpy
SKIPPED test_input_tf (torch/tf not installed)
SKIPPED test_input_torch (torch/tf not installed)
PASSED test_metric_with_cache_dir
PASSED test_multiple_features
PASSED test_separate_experiments_in_parallel
PASSED test_string_casting
PASSED test_string_casting_tested_once
PASSED test_metric_with_multilabel[None-...]
PASSED test_metric_with_multilabel[multilabel-...]
PASSED test_safety_checks_process_vars
PASSED test_metric_with_non_standard_feature_names_add
PASSED test_metric_with_non_standard_feature_names_add_batch
PASSED test_metric_with_non_standard_feature_names_compute
PASSED TestEvaluationcombined_evaluation::test_add
PASSED TestEvaluationcombined_evaluation::test_add_batch
PASSED TestEvaluationcombined_evaluation::test_duplicate_module
PASSED TestEvaluationcombined_evaluation::test_force_prefix_with_dict
PASSED TestEvaluationcombined_evaluation::test_single_module
PASSED TestEvaluationcombined_evaluation::test_two_modules_with_same_score_name

Note: test_modules_from_string excluded — it tries to load 'accuracy' from the
Hub and fails in a network-isolated environment; the failure is pre-existing
(confirmed against upstream/main without my changes).

=== EvaluationModuleError smoke test ===
import evaluate
assert hasattr(evaluate, 'EvaluationModuleError')
try:
    raise evaluate.EvaluationModuleError('...')
except evaluate.EvaluationModuleError:
    pass
all assertions passed
=== LOCAL_TEST_PASSED ===

Risk

Wrapping _compute() is a breaking change only for callers that currently catch
the raw ValueError/KeyError/etc. raised by metric backends and rely on their
exact type — but that pattern was already undocumented and fragile. The
EvaluationModuleError preserves the original exception via from e so
__cause__ is always available. No existing test expectations were changed.

…e exceptions Adds EvaluationModuleError exception class to evaluate/module.py and exports it from evaluate/__init__.py so callers can catch evaluate-specific failures without catching broad Exception or importing internal sklearn/numpy types. Wraps the _compute() call in EvaluationModule.compute() so that raw ValueError/KeyError/etc. from metric backends surface as EvaluationModuleError instead of leaking implementation details. Closes huggingface#758

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(module): export EvaluationModuleError and wrap _compute failures#759

fix(module): export EvaluationModuleError and wrap _compute failures#759
xodn348 wants to merge 1 commit into
huggingface:mainfrom
xodn348:fix/export-evaluation-module-error

xodn348 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xodn348 commented May 14, 2026

Summary

Issue

Local verification

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant