profile
viewpoint

Ask questionsGraph missing some points

Describe the bug

There are apparently some missing points when querying metrics with Grafana, no difference if zooming in or out, except the size of the missing part.

To Reproduce

Collect, wait and query time series without grouping range vectors, eg using rate().

Expected behavior

I expected to see almost the same graph querying Prometheus, Victoria Metrics or m3db.

Screenshots

victoria-metrics

Version

$ ./victoria-metrics-prod --version
victoria-metrics-20190725-211902-tags-v1.23.0-0-g89eb6d78

Additional context

I'm running some tests on service mesh implementations and using prometheus+grafana to track memory, cpu and network throughput of their components. At the same time, I'm also writing metrics to m3db and Victoria Metrics in order to use this same tests to also evaluate them.

Note that in the image attached I have some missing points on Victoria Metrics that doesn't exist on m3db or prometheus. Btw ignore the meaning of "CPU total", in fact I use rate() but this calculation hides the missing points.

Why is that happening? Which informations I can provide in order to better understand the relevant parts of my topology?

Some info:

  • VictoriaMetrics deployment has just one instance without memory or CPU restrictions - host has 8GiB free RAM and 4 free CPUs
  • rate(prometheus_remote_storage_samples_in_total[5m]) graph is ~9500
  • queue config follows doc suggestion: max_shards: 100 and max_samples_per_send: 10000
VictoriaMetrics/VictoriaMetrics

Answer questions valyala

@jcmoraisjr , try upgrading to v1.23.1. It contains a fix for the similar issue. There is yet another fix available in the master branch - try building VictoriaMetrics from the latest commit and verifying whether the issue is fixed.

Why is that happening? Which informations I can provide in order to better understand the relevant parts of my topology?

This could happen when the following conditions are met:

  • Scrape interval between samples has non-zero jitter. For instance, the interval between the first and the second sample is 10 seconds, while the interval between the second and the third sample is 11 seconds.
  • Grafana sends step arg to /api/v1/query_range, which is smaller than the scrape interval for the given time series.

I'd suggest setting up VictoriaMetrics monitoring with the official dashboard for Grafana according to these docs. This can save troubleshooting time in the future.

useful!
source:https://uonfu.com/
Github User Rank List