Skip to content

[ffe] Activate test_flag_eval_metrics.py across released tracer versions#6972

Open
sameerank wants to merge 5 commits into
mainfrom
sameerank/enable-flag-eval-metrics-tests
Open

[ffe] Activate test_flag_eval_metrics.py across released tracer versions#6972
sameerank wants to merge 5 commits into
mainfrom
sameerank/enable-flag-eval-metrics-tests

Conversation

@sameerank
Copy link
Copy Markdown
Contributor

@sameerank sameerank commented May 19, 2026

Motivation

tests/ffe/test_flag_eval_metrics.py is now supported in released versions of every tracer that implements FFE evaluation metrics. The manifests were still pinning these tests to -dev versions or missing_feature, so the tests don't run against the released artifacts even though the feature is shipped.

Release tags:

Gettings the tests to pass for all of these releases let's us update https://feature-parity.us1.prod.dog/?runDateFilter=7d&products=14&feature=548

Changes

Version-gate activation

Manifest Before After
manifests/ruby.yml rails72: v2.32.0-dev rails72: v2.32.0
manifests/python.yml v4.8.0-dev v4.7.0
manifests/golang.yml v2.9.0-dev v2.8.0
manifests/nodejs.yml missing_feature weblog_declaration with "*": incomplete_test_app, express4: *ref_5_99_0
manifests/dotnet.yml missing_feature (FFL-2257 ...) v3.44.0
manifests/java.yml missing_feature (FFL-1972) weblog_declaration with "*": irrelevant, spring-boot: v1.62.0

Only express4 is activated on Node.js because the /ffe endpoint is only implemented in the express test app; other weblogs remain incomplete_test_app, matching the existing pattern used by test_dynamic_evaluation.py and test_exposures.py.

Similarly, Java is scoped to the spring-boot weblog only — non-spring-boot Java weblogs (akka-http, jersey-grizzly2, play, ratpack, resteasy-netty3, spring-boot-3-native, vertx3, vertx4) don't implement /ffe, matching the pattern used by test_dynamic_evaluation.py and test_exposures.py.

Java per-test overrides

Java v1.62.0 ships fixes for 16 of the 17 prior FFL-1972 failures — verified by XPASS on the Java prod spring-boot CI job (logs_endtoend_java_spring-boot_prod_1 artifact, FFE scenario). Only Test_FFE_Eval_Metric_Count still fails:

tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Count::test_ffe_eval_metric_count: bug (FFL-1972)

Node.js per-test overrides

dd-trace-js v5.103.0 has consistency issues in feature_flag.result.reason / error.type tag derivation on the feature_flag.evaluations OTel metric. Tracked in FFL-2313 (subtask under FFL-1899):

tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Basic::test_ffe_eval_metric_basic: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Numeric_To_Integer::test_ffe_eval_metric_numeric_to_integer: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Invalid_Regex::test_ffe_eval_metric_parse_error_invalid_regex: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Variant_Type_Mismatch::test_ffe_eval_metric_parse_error_variant_type_mismatch: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Split::test_ffe_eval_reason_split: bug (FFL-2313)

Test_FFE_Eval_Targeting_Key_Optional was re-tagged from irrelevant (JS SDK requires targeting key) to bug (FFL-1730) — the JS SDK erroring on empty targeting key is a known bug, not a "test doesn't apply" case, and FFL-1730 already covers the root cause (also referenced by test_exposures.py::Test_FFE_EXP_5_Missing_Targeting_Key above it in the same file).

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • Anything but `tests/` or `manifests/` is modified? Only `manifests/` touched — no R&P review needed.
  • A docker base image is modified? No.
  • A scenario is added, removed or renamed? No.

Ruby v2.32.0, Python v4.7.0, Go v2.8.0, Node.js v5.99.0 (express4 only),
.NET v3.44.0, and Java v1.62.0 now ship FFE evaluation metrics. Java
keeps the per-class FFL-1972 overrides so CI XPASSes will flag the
tests that can be trimmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

CODEOWNERS have been resolved as:

manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented May 19, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: dce01cb | Docs | Datadog PR Page | Give us feedback!

sameerank and others added 2 commits May 19, 2026 23:11
dd-trace-js v5.103.0 fails to distinguish feature_flag.result.reason for
static/split/type-mismatch paths (all report targeting_match) and emits
error.type=general instead of parse_error. Tracked in FFL-2313 under
FFL-1899.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Java v1.62.0 actually shipped fixes for 16 of the 17 FFL-1972 tests
(verified by CI XPASS on the prod spring-boot job). Only
Test_FFE_Eval_Metric_Count still fails; the rest can run without overrides.

Test_FFE_Eval_Targeting_Key_Optional for nodejs is more accurately a bug
(FFL-1730) than irrelevant — the JS SDK errors on empty targeting key
instead of treating it as optional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sameerank sameerank force-pushed the sameerank/enable-flag-eval-metrics-tests branch from 9579087 to dc4ef75 Compare May 20, 2026 00:02
Comment thread manifests/java.yml
Comment on lines -3168 to -3181
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Different_Flags::test_ffe_eval_metric_different_flags: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Numeric_To_Integer::test_ffe_eval_metric_numeric_to_integer: bug (FFL-1972)
? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Invalid_Regex::test_ffe_eval_metric_parse_error_invalid_regex
: bug (FFL-1972)
? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Variant_Type_Mismatch::test_ffe_eval_metric_parse_error_variant_type_mismatch
: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Type_Mismatch::test_ffe_eval_metric_type_mismatch: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Nested_Attributes_Ignored::test_ffe_eval_nested_attributes_ignored: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_No_Config_Loaded::test_ffe_eval_no_config_loaded: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Default::test_ffe_eval_reason_default: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Disabled::test_ffe_eval_reason_disabled: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Split::test_ffe_eval_reason_split: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Targeting::test_ffe_eval_reason_targeting: bug (FFL-1972)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Targeting_Key_Optional::test_ffe_eval_targeting_key_optional: bug (FFL-1972)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're passing the tests so I believe they are fixed!

Comment thread manifests/nodejs.yml
Comment on lines +1674 to +1681
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Basic::test_ffe_eval_metric_basic: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Numeric_To_Integer::test_ffe_eval_metric_numeric_to_integer: bug (FFL-2313)
? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Invalid_Regex::test_ffe_eval_metric_parse_error_invalid_regex
: bug (FFL-2313)
? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Variant_Type_Mismatch::test_ffe_eval_metric_parse_error_variant_type_mismatch
: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Split::test_ffe_eval_reason_split: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Targeting_Key_Optional: bug (FFL-1730)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to skip these tests because they are actually concerned with a different "Standardized evaluation reasons" feature https://feature-parity.us1.prod.dog/?runDateFilter=7d&products=14&feature=552&language=2

The previous catch-all entry ran the tests on every Java weblog, which
caused 404s on akka-http, jersey-grizzly2, play, ratpack, resteasy-netty3,
spring-boot-3-native, vertx3, and vertx4 — none of which implement the
/ffe endpoint. Matches the existing weblog_declaration pattern used by
test_dynamic_evaluation.py and test_exposures.py.
@sameerank sameerank marked this pull request as ready for review May 20, 2026 05:56
@sameerank sameerank requested review from a team as code owners May 20, 2026 05:56
@sameerank sameerank requested review from a team, daniel-romano-DD, dd-oleksii, leoromanovsky, manuel-alvarez-alvarez, quinna-h and rachelyangdog and removed request for a team May 20, 2026 05:56
@dd-oleksii
Copy link
Copy Markdown
Member

FFE tests were disabled yesterday due to an incident. Updating the branch to re-trigger the tests to make sure they are still passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants