High Latency Metrics Collection on oDAO node

Performance issue observed on **oDAO node** with metrics collection taking excessive time to respond, suggesting metrics are collected on-demand during query rather than continuously maintained.

Evidence:
- Metric endpoint response times:
   - from localhost:
       ```
       time curl -s 0:9102/metrics  0.00s user 0.01s system 0% cpu 19.347 total
       ```
   - from prometheus slave:  
       ```
       time curl http://10.13.0.58:9102/metrics  0.00s user 0.01s system 0% cpu 44.452 total
       ```

- Impact visible in monitoring:
  - Significant increase in TCP socket TIMEWAIT states
  - File descriptors for rocketpool process show elevated numbers
  - No corresponding increase in system load
  
![image](https://github.com/user-attachments/assets/2d9cd053-45a2-4a41-947a-76548380be66)
![image](https://github.com/user-attachments/assets/51e72eb3-920a-431b-91e4-b9051a43d282)



Suggested improvement:
Consider implementing continuous metric collection instead of on-demand gathering during scrape requests to reduce response latency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Latency Metrics Collection on oDAO node #726

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

High Latency Metrics Collection on oDAO node #726

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions