Skip to content

fix(module): export EvaluationModuleError and wrap _compute failures#759

Open
xodn348 wants to merge 1 commit into
huggingface:mainfrom
xodn348:fix/export-evaluation-module-error
Open

fix(module): export EvaluationModuleError and wrap _compute failures#759
xodn348 wants to merge 1 commit into
huggingface:mainfrom
xodn348:fix/export-evaluation-module-error

Conversation

@xodn348
Copy link
Copy Markdown

@xodn348 xodn348 commented May 14, 2026

Summary

EvaluationModuleError was absent from evaluate's public API, so callers had
no way to catch evaluate-specific failures without catching the broad Exception
base class or importing internal sklearn/numpy error types. This made it
impossible to write except evaluate.EvaluationModuleError — doing so raised an
AttributeError because the symbol was never exported.

This PR adds the EvaluationModuleError class to src/evaluate/module.py and
exports it from src/evaluate/__init__.py. It also wraps the _compute() call
inside EvaluationModule.compute() so that raw backend exceptions (e.g.
ValueError: y_true contains only one label from sklearn) surface as
EvaluationModuleError with a descriptive message, rather than leaking
implementation details to callers.

Issue

Fixes #758

Local verification

=== black check ===
All done! ✨ 🍰 ✨
2 files would be left unchanged.

=== isort check ===
(no output — no changes needed)

=== flake8 ===
(no output — no issues)

=== pytest tests/test_metric.py (22 passed, 2 skipped, 2 deselected) ===
PASSED test_concurrent_metrics
PASSED test_distributed_metrics
PASSED test_dummy_metric
PASSED test_dummy_metric_pickle
PASSED test_input_numpy
SKIPPED test_input_tf (torch/tf not installed)
SKIPPED test_input_torch (torch/tf not installed)
PASSED test_metric_with_cache_dir
PASSED test_multiple_features
PASSED test_separate_experiments_in_parallel
PASSED test_string_casting
PASSED test_string_casting_tested_once
PASSED test_metric_with_multilabel[None-...]
PASSED test_metric_with_multilabel[multilabel-...]
PASSED test_safety_checks_process_vars
PASSED test_metric_with_non_standard_feature_names_add
PASSED test_metric_with_non_standard_feature_names_add_batch
PASSED test_metric_with_non_standard_feature_names_compute
PASSED TestEvaluationcombined_evaluation::test_add
PASSED TestEvaluationcombined_evaluation::test_add_batch
PASSED TestEvaluationcombined_evaluation::test_duplicate_module
PASSED TestEvaluationcombined_evaluation::test_force_prefix_with_dict
PASSED TestEvaluationcombined_evaluation::test_single_module
PASSED TestEvaluationcombined_evaluation::test_two_modules_with_same_score_name

Note: test_modules_from_string excluded — it tries to load 'accuracy' from the
Hub and fails in a network-isolated environment; the failure is pre-existing
(confirmed against upstream/main without my changes).

=== EvaluationModuleError smoke test ===
import evaluate
assert hasattr(evaluate, 'EvaluationModuleError')
try:
    raise evaluate.EvaluationModuleError('...')
except evaluate.EvaluationModuleError:
    pass
all assertions passed
=== LOCAL_TEST_PASSED ===

Risk

Wrapping _compute() is a breaking change only for callers that currently catch
the raw ValueError/KeyError/etc. raised by metric backends and rely on their
exact type — but that pattern was already undocumented and fragile. The
EvaluationModuleError preserves the original exception via from e so
__cause__ is always available. No existing test expectations were changed.

…e exceptions

Adds EvaluationModuleError exception class to evaluate/module.py and exports
it from evaluate/__init__.py so callers can catch evaluate-specific failures
without catching broad Exception or importing internal sklearn/numpy types.

Wraps the _compute() call in EvaluationModule.compute() so that raw
ValueError/KeyError/etc. from metric backends surface as EvaluationModuleError
instead of leaking implementation details.

Closes huggingface#758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EvaluationModuleError not exported from public API — sklearn errors leak to users

1 participant