Histogram scrape performance with multiple labels/label values #216

KevinAMurray · 2018-09-12T09:27:51Z

Hi,

We have a use case where we are using histograms together with between 2 and 6 labels (depending on the metric), and where those labels have between 3 and 40 values. What we are seeing is that the time to perform the scrape increases dramatically when we have more labels and more label values, to the point where the scrape operation could take a couple of seconds (for a worst case situation, which I would expect to happen after the service has been running for some time).

Whilst this clearly isn't a problem for Prometheus, in our particular case the node application is single threaded (e.g. we cannot use Cluster or similar). This means that a prometheus scrape will block that particular instance for a second or two whilst the scrape is happening.

I've explored the prom-client code, and I have made some improvements by effectively pre-computing information when the histogram's time series is created, and by performing some calculations during observe(). I.e. I've slightly changed the design trade-off between optimal observe() performance to improve the scrape performance, with an increase in memory requirements (though I think those memory requirements are probably the same when the scrape is happening).

Have other people encountered this sort of performance issue before, and are there other solutions? (As mentioned, effectively we can't use something like Cluster.)

I'm happy to provide code snippets (or a fork) as the basis of improvements or for more discussions.

Thanks,

Kevin

SimenB · 2018-09-12T10:06:22Z

If we have performance issues with serialising the metrics data, I think it makes sense to take a look at either our storage format or our algorithm 🙂 A test case would be awesome, so we have a baseline. Maybe look into adding benchmarks as well to the repo?

/cc @siimon @zbjornson

KevinAMurray · 2018-09-12T10:14:40Z

Okay -- let me look at modifying our benchmark program for histograms to provide a baseline (and allow others to check I'm not missing something obvious!)

KevinAMurray · 2018-09-17T09:23:08Z

Apologies for the delay. I think I've got a relatively tidy benchmark program (warning -- my coding style is awful) -- I just want to add a few options to make it easier to run on single test cases (currently iterates over everything).

I'll fork the archive and add a new histogramBenchmark.js into tests (probably tomorrow).

Here is a snippet of the current output (not double checked yet!) showing the times taken to complete the generation of the text format data for return (apologies for length). Hopefully you'll see what I'm talking about with the increase of scrape times with number of labels.

benchmark - Benchmark.js results: Scrape histogram
benchmark - ----------------------------------------
benchmark - histogram_case001_2buckets_1labels_25series x 674 ops/sec ±16.38% (70 runs sampled)
benchmark - histogram_case002_2buckets_1labels_50series x 657 ops/sec ±7.58% (70 runs sampled)
benchmark - histogram_case003_2buckets_1labels_100series x 657 ops/sec ±14.22% (74 runs sampled)
benchmark - histogram_case004_2buckets_1labels_200series x 712 ops/sec ±0.87% (75 runs sampled)
benchmark - histogram_case005_2buckets_1labels_400series x 647 ops/sec ±14.56% (77 runs sampled)
benchmark - histogram_case006_2buckets_2labels_25series x 370 ops/sec ±12.06% (66 runs sampled)
benchmark - histogram_case007_2buckets_2labels_50series x 358 ops/sec ±17.24% (61 runs sampled)
benchmark - histogram_case008_2buckets_2labels_100series x 392 ops/sec ±1.58% (65 runs sampled)
benchmark - histogram_case009_2buckets_2labels_200series x 343 ops/sec ±19.18% (59 runs sampled)
benchmark - histogram_case010_2buckets_2labels_300series x 383 ops/sec ±1.31% (68 runs sampled)
benchmark - histogram_case011_2buckets_2labels_500series x 354 ops/sec ±18.60% (65 runs sampled)
benchmark - histogram_case012_2buckets_3labels_25series x 197 ops/sec ±1.25% (56 runs sampled)
benchmark - histogram_case013_2buckets_3labels_50series x 182 ops/sec ±18.85% (55 runs sampled)
benchmark - histogram_case014_2buckets_3labels_100series x 199 ops/sec ±1.26% (57 runs sampled)
benchmark - histogram_case015_2buckets_3labels_150series x 176 ops/sec ±22.23% (56 runs sampled)
benchmark - histogram_case016_2buckets_3labels_150series x 192 ops/sec ±1.33% (58 runs sampled)
benchmark - histogram_case017_2buckets_3labels_750series x 198 ops/sec ±11.95% (60 runs sampled)
benchmark - histogram_case018_2buckets_4labels_25series x 33.72 ops/sec ±4.52% (26 runs sampled)
benchmark - histogram_case019_2buckets_4labels_50series x 27.29 ops/sec ±40.28% (28 runs sampled)
benchmark - histogram_case020_2buckets_4labels_100series x 33.18 ops/sec ±1.78% (28 runs sampled)
benchmark - histogram_case021_2buckets_4labels_150series x 26.40 ops/sec ±39.19% (26 runs sampled)
benchmark - histogram_case022_2buckets_4labels_150series x 34.11 ops/sec ±3.05% (29 runs sampled)
benchmark - histogram_case023_2buckets_4labels_750series x 25.84 ops/sec ±44.42% (26 runs sampled)
benchmark - histogram_case024_2buckets_4labels_750series x 31.91 ops/sec ±1.94% (25 runs sampled)
benchmark - histogram_case025_2buckets_4labels_3000series x 33.43 ops/sec ±2.41% (28 runs sampled)
benchmark - histogram_case026_2buckets_5labels_25series x 7.10 ops/sec ±2.52% (11 runs sampled)
benchmark - histogram_case027_2buckets_5labels_50series x 6.75 ops/sec ±4.13% (11 runs sampled)
benchmark - histogram_case028_2buckets_5labels_100series x 7.32 ops/sec ±3.00% (12 runs sampled)
benchmark - histogram_case029_2buckets_5labels_150series x 7.56 ops/sec ±3.54% (11 runs sampled)
benchmark - histogram_case030_2buckets_5labels_150series x 7.18 ops/sec ±2.58% (11 runs sampled)
benchmark - histogram_case031_2buckets_5labels_750series x 7.11 ops/sec ±2.79% (11 runs sampled)
benchmark - histogram_case032_2buckets_5labels_750series x 6.69 ops/sec ±2.63% (11 runs sampled)
benchmark - histogram_case033_2buckets_5labels_3000series x 7.12 ops/sec ±2.17% (11 runs sampled)
benchmark - histogram_case034_2buckets_5labels_6000series x 7.42 ops/sec ±5.27% (12 runs sampled)
benchmark - histogram_case035_2buckets_5labels_9000series x 7.30 ops/sec ±1.30% (12 runs sampled)
benchmark - histogram_case036_2buckets_6labels_25series x 0.35 ops/sec ±9.42% (5 runs sampled)
benchmark - histogram_case037_2buckets_6labels_50series x 0.36 ops/sec ±2.71% (5 runs sampled)
benchmark - histogram_case038_2buckets_6labels_100series x 0.37 ops/sec ±4.56% (5 runs sampled)
benchmark - histogram_case039_2buckets_6labels_150series x 0.38 ops/sec ±7.14% (5 runs sampled)
benchmark - histogram_case040_2buckets_6labels_150series x 0.36 ops/sec ±2.70% (5 runs sampled)
benchmark - histogram_case041_2buckets_6labels_750series x 0.36 ops/sec ±2.51% (5 runs sampled)
benchmark - histogram_case042_2buckets_6labels_750series x 0.36 ops/sec ±1.53% (5 runs sampled)
benchmark - histogram_case043_2buckets_6labels_3000series x 0.35 ops/sec ±1.77% (5 runs sampled)
benchmark - histogram_case044_2buckets_6labels_6000series x 0.35 ops/sec ±6.85% (5 runs sampled)
benchmark - histogram_case045_2buckets_6labels_9000series x 0.35 ops/sec ±2.04% (5 runs sampled)
benchmark - histogram_case046_2buckets_6labels_27000series x 0.35 ops/sec ±6.11% (5 runs sampled)
benchmark - histogram_case047_2buckets_6labels_45000series x 0.34 ops/sec ±1.81% (5 runs sampled)
benchmark - histogram_case048_6buckets_1labels_45series x 330 ops/sec ±22.17% (70 runs sampled)
benchmark - histogram_case049_6buckets_1labels_90series x 316 ops/sec ±21.00% (67 runs sampled)
benchmark - histogram_case050_6buckets_1labels_180series x 376 ops/sec ±1.38% (72 runs sampled)
benchmark - histogram_case051_6buckets_1labels_360series x 335 ops/sec ±18.58% (65 runs sampled)
benchmark - histogram_case052_6buckets_1labels_720series x 324 ops/sec ±20.18% (68 runs sampled)
benchmark - histogram_case053_6buckets_2labels_45series x 213 ops/sec ±1.02% (67 runs sampled)
benchmark - histogram_case054_6buckets_2labels_90series x 187 ops/sec ±12.08% (55 runs sampled)
benchmark - histogram_case055_6buckets_2labels_180series x 214 ops/sec ±1.36% (64 runs sampled)
benchmark - histogram_case056_6buckets_2labels_360series x 175 ops/sec ±22.18% (64 runs sampled)
benchmark - histogram_case057_6buckets_2labels_540series x 221 ops/sec ±2.24% (59 runs sampled)
benchmark - histogram_case058_6buckets_2labels_900series x 175 ops/sec ±23.21% (58 runs sampled)
benchmark - histogram_case059_6buckets_3labels_45series x 110 ops/sec ±1.21% (57 runs sampled)
benchmark - histogram_case060_6buckets_3labels_90series x 104 ops/sec ±1.34% (54 runs sampled)
benchmark - histogram_case061_6buckets_3labels_180series x 93.09 ops/sec ±24.50% (52 runs sampled)
benchmark - histogram_case062_6buckets_3labels_270series x 107 ops/sec ±1.67% (54 runs sampled)
benchmark - histogram_case063_6buckets_3labels_270series x 104 ops/sec ±1.36% (50 runs sampled)
benchmark - histogram_case064_6buckets_3labels_1350series x 95.06 ops/sec ±24.72% (53 runs sampled)
benchmark - histogram_case065_6buckets_4labels_45series x 17.41 ops/sec ±1.19% (24 runs sampled)
benchmark - histogram_case066_6buckets_4labels_90series x 17.02 ops/sec ±2.48% (22 runs sampled)
benchmark - histogram_case067_6buckets_4labels_180series x 18.08 ops/sec ±3.36% (21 runs sampled)
benchmark - histogram_case068_6buckets_4labels_270series x 18.17 ops/sec ±1.01% (25 runs sampled)
benchmark - histogram_case069_6buckets_4labels_270series x 18.23 ops/sec ±0.66% (24 runs sampled)
benchmark - histogram_case070_6buckets_4labels_1350series x 18.29 ops/sec ±1.95% (25 runs sampled)
benchmark - histogram_case071_6buckets_4labels_1350series x 18.63 ops/sec ±1.07% (24 runs sampled)
benchmark - histogram_case072_6buckets_4labels_5400series x 14.01 ops/sec ±54.01% (24 runs sampled)
benchmark - histogram_case073_6buckets_5labels_45series x 3.68 ops/sec ±5.19% (9 runs sampled)
benchmark - histogram_case074_6buckets_5labels_90series x 3.65 ops/sec ±2.73% (9 runs sampled)
benchmark - histogram_case075_6buckets_5labels_180series x 3.03 ops/sec ±36.96% (9 runs sampled)
benchmark - histogram_case076_6buckets_5labels_270series x 3.67 ops/sec ±4.86% (9 runs sampled)
benchmark - histogram_case077_6buckets_5labels_270series x 3.79 ops/sec ±2.35% (9 runs sampled)
benchmark - histogram_case078_6buckets_5labels_1350series x 3.76 ops/sec ±1.29% (10 runs sampled)
benchmark - histogram_case079_6buckets_5labels_1350series x 3.69 ops/sec ±3.71% (10 runs sampled)
benchmark - histogram_case080_6buckets_5labels_5400series x 3.75 ops/sec ±1.28% (10 runs sampled)
benchmark - histogram_case081_6buckets_5labels_10800series x 3.87 ops/sec ±3.02% (10 runs sampled)
benchmark - histogram_case082_6buckets_5labels_16200series x 3.00 ops/sec ±36.15% (9 runs sampled)
benchmark - histogram_case083_6buckets_6labels_45series x 0.07 ops/sec ±4.45% (5 runs sampled)
benchmark - histogram_case084_6buckets_6labels_90series x 0.07 ops/sec ±4.27% (5 runs sampled)
benchmark - histogram_case085_6buckets_6labels_180series x 0.07 ops/sec ±2.65% (5 runs sampled)
benchmark - histogram_case086_6buckets_6labels_270series x 0.07 ops/sec ±2.05% (5 runs sampled)
benchmark - histogram_case087_6buckets_6labels_270series x 0.07 ops/sec ±1.90% (5 runs sampled)
benchmark - histogram_case088_6buckets_6labels_1350series x 0.07 ops/sec ±1.68% (5 runs sampled)
benchmark - histogram_case089_6buckets_6labels_1350series x 0.07 ops/sec ±1.66% (5 runs sampled)
benchmark - histogram_case090_6buckets_6labels_5400series x 0.07 ops/sec ±2.15% (5 runs sampled)
benchmark - histogram_case091_6buckets_6labels_10800series x 0.07 ops/sec ±1.01% (5 runs sampled)
benchmark - histogram_case092_6buckets_6labels_16200series x 0.07 ops/sec ±3.04% (5 runs sampled)
benchmark - histogram_case093_6buckets_6labels_48600series x 0.07 ops/sec ±2.29% (5 runs sampled)
benchmark - histogram_case094_6buckets_6labels_81000series x 0.07 ops/sec ±4.48% (5 runs sampled)
benchmark - histogram_case095_11buckets_1labels_70series x 217 ops/sec ±1.16% (70 runs sampled)
benchmark - histogram_case096_11buckets_1labels_140series x 188 ops/sec ±24.99% (62 runs sampled)
benchmark - histogram_case097_11buckets_1labels_280series x 208 ops/sec ±1.15% (67 runs sampled)
benchmark - histogram_case098_11buckets_1labels_560series x 169 ops/sec ±42.44% (74 runs sampled)
benchmark - histogram_case099_11buckets_1labels_1120series x 184 ops/sec ±26.46% (61 runs sampled)
benchmark - histogram_case100_11buckets_2labels_70series x 124 ops/sec ±0.93% (63 runs sampled)
benchmark - histogram_case101_11buckets_2labels_140series x 105 ops/sec ±29.33% (56 runs sampled)
benchmark - histogram_case102_11buckets_2labels_280series x 125 ops/sec ±0.88% (63 runs sampled)
benchmark - histogram_case103_11buckets_2labels_560series x 103 ops/sec ±28.15% (55 runs sampled)
benchmark - histogram_case104_11buckets_2labels_840series x 123 ops/sec ±2.29% (58 runs sampled)
benchmark - histogram_case105_11buckets_2labels_1400series x 120 ops/sec ±1.07% (61 runs sampled)
benchmark - histogram_case106_11buckets_3labels_70series x 65.31 ops/sec ±1.59% (45 runs sampled)
benchmark - histogram_case107_11buckets_3labels_140series x 63.58 ops/sec ±0.99% (49 runs sampled)
benchmark - histogram_case108_11buckets_3labels_280series x 55.24 ops/sec ±32.39% (46 runs sampled)
benchmark - histogram_case109_11buckets_3labels_420series x 63.44 ops/sec ±1.25% (49 runs sampled)
benchmark - histogram_case110_11buckets_3labels_420series x 56.16 ops/sec ±27.07% (50 runs sampled)
benchmark - histogram_case111_11buckets_3labels_2100series x 61.00 ops/sec ±5.98% (44 runs sampled)
benchmark - histogram_case112_11buckets_4labels_70series x 10.49 ops/sec ±1.66% (20 runs sampled)
benchmark - histogram_case113_11buckets_4labels_140series x 11.11 ops/sec ±1.68% (20 runs sampled)
benchmark - histogram_case114_11buckets_4labels_280series x 7.96 ops/sec ±51.69% (18 runs sampled)
benchmark - histogram_case115_11buckets_4labels_420series x 11.15 ops/sec ±2.74% (20 runs sampled)
benchmark - histogram_case116_11buckets_4labels_420series x 10.79 ops/sec ±5.40% (18 runs sampled)
benchmark - histogram_case117_11buckets_4labels_2100series x 8.49 ops/sec ±51.96% (19 runs sampled)
benchmark - histogram_case118_11buckets_4labels_2100series x 11.33 ops/sec ±3.09% (21 runs sampled)
benchmark - histogram_case119_11buckets_4labels_8400series x 10.87 ops/sec ±4.33% (20 runs sampled)
benchmark - histogram_case120_11buckets_5labels_70series x 2.23 ops/sec ±2.04% (8 runs sampled)
benchmark - histogram_case121_11buckets_5labels_140series x 2.26 ops/sec ±4.54% (8 runs sampled)
benchmark - histogram_case122_11buckets_5labels_280series x 2.37 ops/sec ±4.74% (8 runs sampled)
benchmark - histogram_case123_11buckets_5labels_420series x 2.27 ops/sec ±2.22% (8 runs sampled)
benchmark - histogram_case124_11buckets_5labels_420series x 2.22 ops/sec ±2.04% (8 runs sampled)
benchmark - histogram_case125_11buckets_5labels_2100series x 2.21 ops/sec ±2.95% (8 runs sampled)
benchmark - histogram_case126_11buckets_5labels_2100series x 2.27 ops/sec ±2.15% (8 runs sampled)
benchmark - histogram_case127_11buckets_5labels_8400series x 1.72 ops/sec ±39.26% (8 runs sampled)
benchmark - histogram_case128_11buckets_5labels_16800series x 2.23 ops/sec ±2.53% (8 runs sampled)
benchmark - histogram_case129_11buckets_5labels_25200series x 1.92 ops/sec ±35.62% (8 runs sampled)
benchmark - histogram_case130_11buckets_6labels_70series x 0.04 ops/sec ±1.08% (5 runs sampled)
benchmark - histogram_case131_11buckets_6labels_140series x 0.04 ops/sec ±1.01% (5 runs sampled)
benchmark - histogram_case132_11buckets_6labels_280series x 0.04 ops/sec ±3.68% (5 runs sampled)
benchmark - histogram_case133_11buckets_6labels_420series x 0.04 ops/sec ±1.49% (5 runs sampled)
benchmark - histogram_case134_11buckets_6labels_420series x 0.04 ops/sec ±1.98% (5 runs sampled)
benchmark - histogram_case135_11buckets_6labels_2100series x 0.04 ops/sec ±0.59% (5 runs sampled)
benchmark - histogram_case136_11buckets_6labels_2100series x 0.04 ops/sec ±1.84% (5 runs sampled)
benchmark - histogram_case137_11buckets_6labels_8400series x 0.04 ops/sec ±1.83% (5 runs sampled)
benchmark - histogram_case138_11buckets_6labels_16800series x 0.04 ops/sec ±2.87% (5 runs sampled)
benchmark - histogram_case139_11buckets_6labels_25200series x 0.04 ops/sec ±2.85% (5 runs sampled)
benchmark - histogram_case140_11buckets_6labels_75600series x 0.04 ops/sec ±0.39% (5 runs sampled)
benchmark - histogram_case141_11buckets_6labels_126000series x 0.04 ops/sec ±3.18% (5 runs sampled)

KevinAMurray · 2018-09-17T16:00:21Z

Hmm. I think there is a bug in the multipliers above so the numbers will be wrong....

SimenB · 2018-09-19T19:41:56Z

Please give 11.1.2 a try 🙂 We can reopen if it's still an issue

KevinAMurray · 2018-09-20T13:05:21Z

Thanks. I'm seeing about a double in performance. Very Nice! thanks. The performance is much more linear, which is great.

The benchmark test program is sitting in my fork in the benchmark directory. Shall I PR it into the main branch?

Running as node histogram_benchmark.js -b 1,2 label_1=80 label_2=1 label_3=1 label_4=1 label_5=1 I'm still seeing a penalty for adding labels. I'm using extra labels with a single value to show impact (thought that would normally be handled via Prometheus SD labelling). I'd expect some small performance loss from adding an extra label, but I'd hope for less.

Specifically:
histogram_case001_2buckets_1labels_400series x 808 ops/sec ±1.40% (80 runs sampled)
histogram_case002_2buckets_2labels_400series x 555 ops/sec ±2.59% (74 runs sampled)
histogram_case003_2buckets_3labels_400series x 405 ops/sec ±3.74% (64 runs sampled)
histogram_case004_2buckets_4labels_400series x 333 ops/sec ±3.67% (67 runs sampled)
histogram_case005_2buckets_5labels_400series x 296 ops/sec ±2.22% (65 runs sampled)

Previously (I think 11.1.1), and snipped from different log (hence slight differentces):
histogram_case001_2buckets_1labels_400series x 439 ops/sec ±3.76% (61 runs sampled)
histogram_case003_2buckets_2labels_400series x 323 ops/sec ±5.75% (58 runs sampled)
histogram_case005_2buckets_3labels_400series x 93.73 ops/sec ±21.01% (27 runs sampled)
histogram_case007_2buckets_4labels_400series x 140 ops/sec ±18.33% (37 runs sampled)
histogram_case009_2buckets_5labels_400series x 138 ops/sec ±12.34% (41 runs sampled)
(I'd ignore case005 -- probably background issue on test machine).

I'll try and run some more exhaustive tests on our detailed use cases tonight.

SimenB · 2018-09-20T13:14:50Z

Also see #220

I'd love to add a benchmark to this repo, feel free to PR it!

KevinAMurray · 2018-10-01T11:46:21Z

@SimenB, @siimon : I've been tinkering on performance using the benchmarks tools in 11.1.3 (great stuff -- many thanks for them!).

I'm pre-computing the Prometheus label string (mostly) in histogram, and I'm seeing about a 33% performance gain from this at the expense of a 10% performance drop against getMetricsAsJSON and obviously an increase in memory to hold the precomputed labels.

I'd appreciate comments on these trade-offs. Code is currently in https://github.com/KevinAMurray/prom-client/tree/performance-metrics-scrape

(The hash used is very close to the prometheus labels, but it doesn't do the escape string. It may be possible to use the hash instead at a lesser memory overhead. I've yet to see what the performance costs of that would be. Hope to create a branch to check that idea out sometime.)

KevinAMurray · 2018-10-02T16:14:50Z

@nowells I've also expanded the benchmarks a bit more now.

nowells · 2018-10-02T16:19:48Z

@KevinAMurray awesome! I can't wait to see how they evolve. Anything I can do to help?

nowells mentioned this issue Sep 19, 2018

Fix histogram scrape performance #219

Merged

SimenB closed this as completed in 45e0c3e Sep 19, 2018

snyk-bot mentioned this issue Oct 26, 2019

[Snyk] Upgrade prom-client from 10.2.3 to 11.5.3 ajesse11x/kubeless#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram scrape performance with multiple labels/label values #216

Histogram scrape performance with multiple labels/label values #216

KevinAMurray commented Sep 12, 2018

SimenB commented Sep 12, 2018

KevinAMurray commented Sep 12, 2018

KevinAMurray commented Sep 17, 2018

KevinAMurray commented Sep 17, 2018

SimenB commented Sep 19, 2018

KevinAMurray commented Sep 20, 2018

SimenB commented Sep 20, 2018

KevinAMurray commented Oct 1, 2018

KevinAMurray commented Oct 2, 2018

nowells commented Oct 2, 2018

Histogram scrape performance with multiple labels/label values #216

Histogram scrape performance with multiple labels/label values #216

Comments

KevinAMurray commented Sep 12, 2018

SimenB commented Sep 12, 2018

KevinAMurray commented Sep 12, 2018

KevinAMurray commented Sep 17, 2018

KevinAMurray commented Sep 17, 2018

SimenB commented Sep 19, 2018

KevinAMurray commented Sep 20, 2018

SimenB commented Sep 20, 2018

KevinAMurray commented Oct 1, 2018

KevinAMurray commented Oct 2, 2018

nowells commented Oct 2, 2018