[ffe] Activate test_flag_eval_metrics.py across released tracer versions by sameerank · Pull Request #6972 · DataDog/system-tests

sameerank · 2026-05-19T22:15:59Z

Motivation

tests/ffe/test_flag_eval_metrics.py is now supported in released versions of every tracer that implements FFE evaluation metrics. The manifests were still pinning these tests to -dev versions or missing_feature, so the tests don't run against the released artifacts even though the feature is shipped.

Release tags:

Gettings the tests to pass for all of these releases let's us update https://feature-parity.us1.prod.dog/?runDateFilter=7d&products=14&feature=548

Changes

Version-gate activation

Manifest	Before	After
`manifests/ruby.yml`	`rails72: v2.32.0-dev`	`rails72: v2.32.0`
`manifests/python.yml`	`v4.8.0-dev`	`v4.7.0`
`manifests/golang.yml`	`v2.9.0-dev`	`v2.8.0`
`manifests/nodejs.yml`	`missing_feature`	`weblog_declaration` with `"": incomplete_test_app`, `express4: ref_5_99_0`
`manifests/dotnet.yml`	`missing_feature (FFL-2257 ...)`	`v3.44.0`
`manifests/java.yml`	`missing_feature (FFL-1972)`	`weblog_declaration` with `"*": irrelevant`, `spring-boot: v1.62.0`

Only express4 is activated on Node.js because the /ffe endpoint is only implemented in the express test app; other weblogs remain incomplete_test_app, matching the existing pattern used by test_dynamic_evaluation.py and test_exposures.py.

Similarly, Java is scoped to the spring-boot weblog only — non-spring-boot Java weblogs (akka-http, jersey-grizzly2, play, ratpack, resteasy-netty3, spring-boot-3-native, vertx3, vertx4) don't implement /ffe, matching the pattern used by test_dynamic_evaluation.py and test_exposures.py.

Java per-test overrides

Java v1.62.0 ships fixes for 16 of the 17 prior FFL-1972 failures — verified by XPASS on the Java prod spring-boot CI job (logs_endtoend_java_spring-boot_prod_1 artifact, FFE scenario). Only Test_FFE_Eval_Metric_Count still fails:

tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Count::test_ffe_eval_metric_count: bug (FFL-1972)

Node.js per-test overrides

dd-trace-js v5.103.0 has consistency issues in feature_flag.result.reason / error.type tag derivation on the feature_flag.evaluations OTel metric. Tracked in FFL-2313 (subtask under FFL-1899):

tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Basic::test_ffe_eval_metric_basic: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Numeric_To_Integer::test_ffe_eval_metric_numeric_to_integer: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Invalid_Regex::test_ffe_eval_metric_parse_error_invalid_regex: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Variant_Type_Mismatch::test_ffe_eval_metric_parse_error_variant_type_mismatch: bug (FFL-2313)
tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Split::test_ffe_eval_reason_split: bug (FFL-2313)

Test_FFE_Eval_Targeting_Key_Optional was re-tagged from irrelevant (JS SDK requires targeting key) to bug (FFL-1730) — the JS SDK erroring on empty targeting key is a known bug, not a "test doesn't apply" case, and FFL-1730 already covers the root cause (also referenced by test_exposures.py::Test_FFE_EXP_5_Missing_Targeting_Key above it in the same file).

Workflow

⚠️ Create your PR as draft ⚠️
Work on you PR until the CI passes
Mark it as ready for review
- Test logic is modified? -> Get a review from RFC owner.
- Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

Anything but `tests/` or `manifests/` is modified? Only `manifests/` touched — no R&P review needed.
A docker base image is modified? No.
A scenario is added, removed or renamed? No.

Ruby v2.32.0, Python v4.7.0, Go v2.8.0, Node.js v5.99.0 (express4 only), .NET v3.44.0, and Java v1.62.0 now ship FFE evaluation metrics. Java keeps the per-class FFL-1972 overrides so CI XPASSes will flag the tests that can be trimmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-19T22:16:31Z

CODEOWNERS have been resolved as:

manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby

datadog-datadog-prod-us1 · 2026-05-19T22:25:13Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: dce01cb | Docs | Datadog PR Page | Give us feedback!}

dd-trace-js v5.103.0 fails to distinguish feature_flag.result.reason for static/split/type-mismatch paths (all report targeting_match) and emits error.type=general instead of parse_error. Tracked in FFL-2313 under FFL-1899. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Java v1.62.0 actually shipped fixes for 16 of the 17 FFL-1972 tests (verified by CI XPASS on the prod spring-boot job). Only Test_FFE_Eval_Metric_Count still fails; the rest can run without overrides. Test_FFE_Eval_Targeting_Key_Optional for nodejs is more accurately a bug (FFL-1730) than irrelevant — the JS SDK errors on empty targeting key instead of treating it as optional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sameerank · 2026-05-20T00:09:14Z

-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Different_Flags::test_ffe_eval_metric_different_flags: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Numeric_To_Integer::test_ffe_eval_metric_numeric_to_integer: bug (FFL-1972)
-  ? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Invalid_Regex::test_ffe_eval_metric_parse_error_invalid_regex
-  : bug (FFL-1972)
-  ? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Variant_Type_Mismatch::test_ffe_eval_metric_parse_error_variant_type_mismatch
-  : bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Type_Mismatch::test_ffe_eval_metric_type_mismatch: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Nested_Attributes_Ignored::test_ffe_eval_nested_attributes_ignored: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_No_Config_Loaded::test_ffe_eval_no_config_loaded: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Default::test_ffe_eval_reason_default: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Disabled::test_ffe_eval_reason_disabled: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Split::test_ffe_eval_reason_split: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Targeting::test_ffe_eval_reason_targeting: bug (FFL-1972)
-  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Targeting_Key_Optional::test_ffe_eval_targeting_key_optional: bug (FFL-1972)


They're passing the tests so I believe they are fixed!

sameerank · 2026-05-20T00:11:41Z

+  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Basic::test_ffe_eval_metric_basic: bug (FFL-2313)
+  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Numeric_To_Integer::test_ffe_eval_metric_numeric_to_integer: bug (FFL-2313)
+  ? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Invalid_Regex::test_ffe_eval_metric_parse_error_invalid_regex
+  : bug (FFL-2313)
+  ? tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Metric_Parse_Error_Variant_Type_Mismatch::test_ffe_eval_metric_parse_error_variant_type_mismatch
+  : bug (FFL-2313)
+  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Reason_Split::test_ffe_eval_reason_split: bug (FFL-2313)
+  tests/ffe/test_flag_eval_metrics.py::Test_FFE_Eval_Targeting_Key_Optional: bug (FFL-1730)


I think it's okay to skip these tests because they are actually concerned with a different "Standardized evaluation reasons" feature https://feature-parity.us1.prod.dog/?runDateFilter=7d&products=14&feature=552&language=2

The previous catch-all entry ran the tests on every Java weblog, which caused 404s on akka-http, jersey-grizzly2, play, ratpack, resteasy-netty3, spring-boot-3-native, vertx3, and vertx4 — none of which implement the /ffe endpoint. Matches the existing weblog_declaration pattern used by test_dynamic_evaluation.py and test_exposures.py.

dd-oleksii · 2026-05-20T17:38:13Z

FFE tests were disabled yesterday due to an incident. Updating the branch to re-trigger the tests to make sure they are still passing

sameerank and others added 2 commits May 19, 2026 23:11

sameerank force-pushed the sameerank/enable-flag-eval-metrics-tests branch from 9579087 to dc4ef75 Compare May 20, 2026 00:02

sameerank commented May 20, 2026

View reviewed changes

sameerank marked this pull request as ready for review May 20, 2026 05:56

sameerank requested review from a team as code owners May 20, 2026 05:56

sameerank requested review from a team, daniel-romano-DD, dd-oleksii, leoromanovsky, manuel-alvarez-alvarez, quinna-h and rachelyangdog and removed request for a team May 20, 2026 05:56

Merge branch 'main' into sameerank/enable-flag-eval-metrics-tests

dce01cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ffe] Activate test_flag_eval_metrics.py across released tracer versions#6972

[ffe] Activate test_flag_eval_metrics.py across released tracer versions#6972
sameerank wants to merge 5 commits into
mainfrom
sameerank/enable-flag-eval-metrics-tests

sameerank commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

datadog-datadog-prod-us1 Bot commented May 19, 2026 •

edited by datadog-prod-us1-5 Bot

Loading

Uh oh!

sameerank May 20, 2026

Uh oh!

sameerank May 20, 2026

Uh oh!

dd-oleksii commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sameerank commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Version-gate activation

Java per-test overrides

Node.js per-test overrides

Workflow

Reviewer checklist

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

datadog-datadog-prod-us1 Bot commented May 19, 2026 • edited by datadog-prod-us1-5 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sameerank May 20, 2026

Choose a reason for hiding this comment

Uh oh!

sameerank May 20, 2026

Choose a reason for hiding this comment

Uh oh!

dd-oleksii commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sameerank commented May 19, 2026 •

edited

Loading

datadog-datadog-prod-us1 Bot commented May 19, 2026 •

edited by datadog-prod-us1-5 Bot

Loading