Skip to content

High Latency Metrics Collection on oDAO node #726

@mendelskiv93

Description

@mendelskiv93

Performance issue observed on oDAO node with metrics collection taking excessive time to respond, suggesting metrics are collected on-demand during query rather than continuously maintained.

Evidence:

  • Metric endpoint response times:

    • from localhost:
      time curl -s 0:9102/metrics  0.00s user 0.01s system 0% cpu 19.347 total
      
    • from prometheus slave:
      time curl http://10.13.0.58:9102/metrics  0.00s user 0.01s system 0% cpu 44.452 total
      
  • Impact visible in monitoring:

    • Significant increase in TCP socket TIMEWAIT states
    • File descriptors for rocketpool process show elevated numbers
    • No corresponding increase in system load

image
image

Suggested improvement:
Consider implementing continuous metric collection instead of on-demand gathering during scrape requests to reduce response latency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions