Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram scrape performance with multiple labels/label values #216

Closed
KevinAMurray opened this issue Sep 12, 2018 · 10 comments · May be fixed by ajesse11x/kubeless#3
Closed

Histogram scrape performance with multiple labels/label values #216

KevinAMurray opened this issue Sep 12, 2018 · 10 comments · May be fixed by ajesse11x/kubeless#3

Comments

@KevinAMurray
Copy link

Hi,

We have a use case where we are using histograms together with between 2 and 6 labels (depending on the metric), and where those labels have between 3 and 40 values. What we are seeing is that the time to perform the scrape increases dramatically when we have more labels and more label values, to the point where the scrape operation could take a couple of seconds (for a worst case situation, which I would expect to happen after the service has been running for some time).

Whilst this clearly isn't a problem for Prometheus, in our particular case the node application is single threaded (e.g. we cannot use Cluster or similar). This means that a prometheus scrape will block that particular instance for a second or two whilst the scrape is happening.

I've explored the prom-client code, and I have made some improvements by effectively pre-computing information when the histogram's time series is created, and by performing some calculations during observe(). I.e. I've slightly changed the design trade-off between optimal observe() performance to improve the scrape performance, with an increase in memory requirements (though I think those memory requirements are probably the same when the scrape is happening).

Have other people encountered this sort of performance issue before, and are there other solutions? (As mentioned, effectively we can't use something like Cluster.)

I'm happy to provide code snippets (or a fork) as the basis of improvements or for more discussions.

Thanks,

Kevin

@SimenB
Copy link
Collaborator

SimenB commented Sep 12, 2018

If we have performance issues with serialising the metrics data, I think it makes sense to take a look at either our storage format or our algorithm 🙂 A test case would be awesome, so we have a baseline. Maybe look into adding benchmarks as well to the repo?

/cc @siimon @zbjornson

@KevinAMurray
Copy link
Author

Okay -- let me look at modifying our benchmark program for histograms to provide a baseline (and allow others to check I'm not missing something obvious!)

@KevinAMurray
Copy link
Author

Apologies for the delay. I think I've got a relatively tidy benchmark program (warning -- my coding style is awful) -- I just want to add a few options to make it easier to run on single test cases (currently iterates over everything).

I'll fork the archive and add a new histogramBenchmark.js into tests (probably tomorrow).

Here is a snippet of the current output (not double checked yet!) showing the times taken to complete the generation of the text format data for return (apologies for length). Hopefully you'll see what I'm talking about with the increase of scrape times with number of labels.

benchmark - Benchmark.js results: Scrape histogram
benchmark - ----------------------------------------
benchmark - histogram_case001_2buckets_1labels_25series x 674 ops/sec ±16.38% (70 runs sampled)
benchmark - histogram_case002_2buckets_1labels_50series x 657 ops/sec ±7.58% (70 runs sampled)
benchmark - histogram_case003_2buckets_1labels_100series x 657 ops/sec ±14.22% (74 runs sampled)
benchmark - histogram_case004_2buckets_1labels_200series x 712 ops/sec ±0.87% (75 runs sampled)
benchmark - histogram_case005_2buckets_1labels_400series x 647 ops/sec ±14.56% (77 runs sampled)
benchmark - histogram_case006_2buckets_2labels_25series x 370 ops/sec ±12.06% (66 runs sampled)
benchmark - histogram_case007_2buckets_2labels_50series x 358 ops/sec ±17.24% (61 runs sampled)
benchmark - histogram_case008_2buckets_2labels_100series x 392 ops/sec ±1.58% (65 runs sampled)
benchmark - histogram_case009_2buckets_2labels_200series x 343 ops/sec ±19.18% (59 runs sampled)
benchmark - histogram_case010_2buckets_2labels_300series x 383 ops/sec ±1.31% (68 runs sampled)
benchmark - histogram_case011_2buckets_2labels_500series x 354 ops/sec ±18.60% (65 runs sampled)
benchmark - histogram_case012_2buckets_3labels_25series x 197 ops/sec ±1.25% (56 runs sampled)
benchmark - histogram_case013_2buckets_3labels_50series x 182 ops/sec ±18.85% (55 runs sampled)
benchmark - histogram_case014_2buckets_3labels_100series x 199 ops/sec ±1.26% (57 runs sampled)
benchmark - histogram_case015_2buckets_3labels_150series x 176 ops/sec ±22.23% (56 runs sampled)
benchmark - histogram_case016_2buckets_3labels_150series x 192 ops/sec ±1.33% (58 runs sampled)
benchmark - histogram_case017_2buckets_3labels_750series x 198 ops/sec ±11.95% (60 runs sampled)
benchmark - histogram_case018_2buckets_4labels_25series x 33.72 ops/sec ±4.52% (26 runs sampled)
benchmark - histogram_case019_2buckets_4labels_50series x 27.29 ops/sec ±40.28% (28 runs sampled)
benchmark - histogram_case020_2buckets_4labels_100series x 33.18 ops/sec ±1.78% (28 runs sampled)
benchmark - histogram_case021_2buckets_4labels_150series x 26.40 ops/sec ±39.19% (26 runs sampled)
benchmark - histogram_case022_2buckets_4labels_150series x 34.11 ops/sec ±3.05% (29 runs sampled)
benchmark - histogram_case023_2buckets_4labels_750series x 25.84 ops/sec ±44.42% (26 runs sampled)
benchmark - histogram_case024_2buckets_4labels_750series x 31.91 ops/sec ±1.94% (25 runs sampled)
benchmark - histogram_case025_2buckets_4labels_3000series x 33.43 ops/sec ±2.41% (28 runs sampled)
benchmark - histogram_case026_2buckets_5labels_25series x 7.10 ops/sec ±2.52% (11 runs sampled)
benchmark - histogram_case027_2buckets_5labels_50series x 6.75 ops/sec ±4.13% (11 runs sampled)
benchmark - histogram_case028_2buckets_5labels_100series x 7.32 ops/sec ±3.00% (12 runs sampled)
benchmark - histogram_case029_2buckets_5labels_150series x 7.56 ops/sec ±3.54% (11 runs sampled)
benchmark - histogram_case030_2buckets_5labels_150series x 7.18 ops/sec ±2.58% (11 runs sampled)
benchmark - histogram_case031_2buckets_5labels_750series x 7.11 ops/sec ±2.79% (11 runs sampled)
benchmark - histogram_case032_2buckets_5labels_750series x 6.69 ops/sec ±2.63% (11 runs sampled)
benchmark - histogram_case033_2buckets_5labels_3000series x 7.12 ops/sec ±2.17% (11 runs sampled)
benchmark - histogram_case034_2buckets_5labels_6000series x 7.42 ops/sec ±5.27% (12 runs sampled)
benchmark - histogram_case035_2buckets_5labels_9000series x 7.30 ops/sec ±1.30% (12 runs sampled)
benchmark - histogram_case036_2buckets_6labels_25series x 0.35 ops/sec ±9.42% (5 runs sampled)
benchmark - histogram_case037_2buckets_6labels_50series x 0.36 ops/sec ±2.71% (5 runs sampled)
benchmark - histogram_case038_2buckets_6labels_100series x 0.37 ops/sec ±4.56% (5 runs sampled)
benchmark - histogram_case039_2buckets_6labels_150series x 0.38 ops/sec ±7.14% (5 runs sampled)
benchmark - histogram_case040_2buckets_6labels_150series x 0.36 ops/sec ±2.70% (5 runs sampled)
benchmark - histogram_case041_2buckets_6labels_750series x 0.36 ops/sec ±2.51% (5 runs sampled)
benchmark - histogram_case042_2buckets_6labels_750series x 0.36 ops/sec ±1.53% (5 runs sampled)
benchmark - histogram_case043_2buckets_6labels_3000series x 0.35 ops/sec ±1.77% (5 runs sampled)
benchmark - histogram_case044_2buckets_6labels_6000series x 0.35 ops/sec ±6.85% (5 runs sampled)
benchmark - histogram_case045_2buckets_6labels_9000series x 0.35 ops/sec ±2.04% (5 runs sampled)
benchmark - histogram_case046_2buckets_6labels_27000series x 0.35 ops/sec ±6.11% (5 runs sampled)
benchmark - histogram_case047_2buckets_6labels_45000series x 0.34 ops/sec ±1.81% (5 runs sampled)
benchmark - histogram_case048_6buckets_1labels_45series x 330 ops/sec ±22.17% (70 runs sampled)
benchmark - histogram_case049_6buckets_1labels_90series x 316 ops/sec ±21.00% (67 runs sampled)
benchmark - histogram_case050_6buckets_1labels_180series x 376 ops/sec ±1.38% (72 runs sampled)
benchmark - histogram_case051_6buckets_1labels_360series x 335 ops/sec ±18.58% (65 runs sampled)
benchmark - histogram_case052_6buckets_1labels_720series x 324 ops/sec ±20.18% (68 runs sampled)
benchmark - histogram_case053_6buckets_2labels_45series x 213 ops/sec ±1.02% (67 runs sampled)
benchmark - histogram_case054_6buckets_2labels_90series x 187 ops/sec ±12.08% (55 runs sampled)
benchmark - histogram_case055_6buckets_2labels_180series x 214 ops/sec ±1.36% (64 runs sampled)
benchmark - histogram_case056_6buckets_2labels_360series x 175 ops/sec ±22.18% (64 runs sampled)
benchmark - histogram_case057_6buckets_2labels_540series x 221 ops/sec ±2.24% (59 runs sampled)
benchmark - histogram_case058_6buckets_2labels_900series x 175 ops/sec ±23.21% (58 runs sampled)
benchmark - histogram_case059_6buckets_3labels_45series x 110 ops/sec ±1.21% (57 runs sampled)
benchmark - histogram_case060_6buckets_3labels_90series x 104 ops/sec ±1.34% (54 runs sampled)
benchmark - histogram_case061_6buckets_3labels_180series x 93.09 ops/sec ±24.50% (52 runs sampled)
benchmark - histogram_case062_6buckets_3labels_270series x 107 ops/sec ±1.67% (54 runs sampled)
benchmark - histogram_case063_6buckets_3labels_270series x 104 ops/sec ±1.36% (50 runs sampled)
benchmark - histogram_case064_6buckets_3labels_1350series x 95.06 ops/sec ±24.72% (53 runs sampled)
benchmark - histogram_case065_6buckets_4labels_45series x 17.41 ops/sec ±1.19% (24 runs sampled)
benchmark - histogram_case066_6buckets_4labels_90series x 17.02 ops/sec ±2.48% (22 runs sampled)
benchmark - histogram_case067_6buckets_4labels_180series x 18.08 ops/sec ±3.36% (21 runs sampled)
benchmark - histogram_case068_6buckets_4labels_270series x 18.17 ops/sec ±1.01% (25 runs sampled)
benchmark - histogram_case069_6buckets_4labels_270series x 18.23 ops/sec ±0.66% (24 runs sampled)
benchmark - histogram_case070_6buckets_4labels_1350series x 18.29 ops/sec ±1.95% (25 runs sampled)
benchmark - histogram_case071_6buckets_4labels_1350series x 18.63 ops/sec ±1.07% (24 runs sampled)
benchmark - histogram_case072_6buckets_4labels_5400series x 14.01 ops/sec ±54.01% (24 runs sampled)
benchmark - histogram_case073_6buckets_5labels_45series x 3.68 ops/sec ±5.19% (9 runs sampled)
benchmark - histogram_case074_6buckets_5labels_90series x 3.65 ops/sec ±2.73% (9 runs sampled)
benchmark - histogram_case075_6buckets_5labels_180series x 3.03 ops/sec ±36.96% (9 runs sampled)
benchmark - histogram_case076_6buckets_5labels_270series x 3.67 ops/sec ±4.86% (9 runs sampled)
benchmark - histogram_case077_6buckets_5labels_270series x 3.79 ops/sec ±2.35% (9 runs sampled)
benchmark - histogram_case078_6buckets_5labels_1350series x 3.76 ops/sec ±1.29% (10 runs sampled)
benchmark - histogram_case079_6buckets_5labels_1350series x 3.69 ops/sec ±3.71% (10 runs sampled)
benchmark - histogram_case080_6buckets_5labels_5400series x 3.75 ops/sec ±1.28% (10 runs sampled)
benchmark - histogram_case081_6buckets_5labels_10800series x 3.87 ops/sec ±3.02% (10 runs sampled)
benchmark - histogram_case082_6buckets_5labels_16200series x 3.00 ops/sec ±36.15% (9 runs sampled)
benchmark - histogram_case083_6buckets_6labels_45series x 0.07 ops/sec ±4.45% (5 runs sampled)
benchmark - histogram_case084_6buckets_6labels_90series x 0.07 ops/sec ±4.27% (5 runs sampled)
benchmark - histogram_case085_6buckets_6labels_180series x 0.07 ops/sec ±2.65% (5 runs sampled)
benchmark - histogram_case086_6buckets_6labels_270series x 0.07 ops/sec ±2.05% (5 runs sampled)
benchmark - histogram_case087_6buckets_6labels_270series x 0.07 ops/sec ±1.90% (5 runs sampled)
benchmark - histogram_case088_6buckets_6labels_1350series x 0.07 ops/sec ±1.68% (5 runs sampled)
benchmark - histogram_case089_6buckets_6labels_1350series x 0.07 ops/sec ±1.66% (5 runs sampled)
benchmark - histogram_case090_6buckets_6labels_5400series x 0.07 ops/sec ±2.15% (5 runs sampled)
benchmark - histogram_case091_6buckets_6labels_10800series x 0.07 ops/sec ±1.01% (5 runs sampled)
benchmark - histogram_case092_6buckets_6labels_16200series x 0.07 ops/sec ±3.04% (5 runs sampled)
benchmark - histogram_case093_6buckets_6labels_48600series x 0.07 ops/sec ±2.29% (5 runs sampled)
benchmark - histogram_case094_6buckets_6labels_81000series x 0.07 ops/sec ±4.48% (5 runs sampled)
benchmark - histogram_case095_11buckets_1labels_70series x 217 ops/sec ±1.16% (70 runs sampled)
benchmark - histogram_case096_11buckets_1labels_140series x 188 ops/sec ±24.99% (62 runs sampled)
benchmark - histogram_case097_11buckets_1labels_280series x 208 ops/sec ±1.15% (67 runs sampled)
benchmark - histogram_case098_11buckets_1labels_560series x 169 ops/sec ±42.44% (74 runs sampled)
benchmark - histogram_case099_11buckets_1labels_1120series x 184 ops/sec ±26.46% (61 runs sampled)
benchmark - histogram_case100_11buckets_2labels_70series x 124 ops/sec ±0.93% (63 runs sampled)
benchmark - histogram_case101_11buckets_2labels_140series x 105 ops/sec ±29.33% (56 runs sampled)
benchmark - histogram_case102_11buckets_2labels_280series x 125 ops/sec ±0.88% (63 runs sampled)
benchmark - histogram_case103_11buckets_2labels_560series x 103 ops/sec ±28.15% (55 runs sampled)
benchmark - histogram_case104_11buckets_2labels_840series x 123 ops/sec ±2.29% (58 runs sampled)
benchmark - histogram_case105_11buckets_2labels_1400series x 120 ops/sec ±1.07% (61 runs sampled)
benchmark - histogram_case106_11buckets_3labels_70series x 65.31 ops/sec ±1.59% (45 runs sampled)
benchmark - histogram_case107_11buckets_3labels_140series x 63.58 ops/sec ±0.99% (49 runs sampled)
benchmark - histogram_case108_11buckets_3labels_280series x 55.24 ops/sec ±32.39% (46 runs sampled)
benchmark - histogram_case109_11buckets_3labels_420series x 63.44 ops/sec ±1.25% (49 runs sampled)
benchmark - histogram_case110_11buckets_3labels_420series x 56.16 ops/sec ±27.07% (50 runs sampled)
benchmark - histogram_case111_11buckets_3labels_2100series x 61.00 ops/sec ±5.98% (44 runs sampled)
benchmark - histogram_case112_11buckets_4labels_70series x 10.49 ops/sec ±1.66% (20 runs sampled)
benchmark - histogram_case113_11buckets_4labels_140series x 11.11 ops/sec ±1.68% (20 runs sampled)
benchmark - histogram_case114_11buckets_4labels_280series x 7.96 ops/sec ±51.69% (18 runs sampled)
benchmark - histogram_case115_11buckets_4labels_420series x 11.15 ops/sec ±2.74% (20 runs sampled)
benchmark - histogram_case116_11buckets_4labels_420series x 10.79 ops/sec ±5.40% (18 runs sampled)
benchmark - histogram_case117_11buckets_4labels_2100series x 8.49 ops/sec ±51.96% (19 runs sampled)
benchmark - histogram_case118_11buckets_4labels_2100series x 11.33 ops/sec ±3.09% (21 runs sampled)
benchmark - histogram_case119_11buckets_4labels_8400series x 10.87 ops/sec ±4.33% (20 runs sampled)
benchmark - histogram_case120_11buckets_5labels_70series x 2.23 ops/sec ±2.04% (8 runs sampled)
benchmark - histogram_case121_11buckets_5labels_140series x 2.26 ops/sec ±4.54% (8 runs sampled)
benchmark - histogram_case122_11buckets_5labels_280series x 2.37 ops/sec ±4.74% (8 runs sampled)
benchmark - histogram_case123_11buckets_5labels_420series x 2.27 ops/sec ±2.22% (8 runs sampled)
benchmark - histogram_case124_11buckets_5labels_420series x 2.22 ops/sec ±2.04% (8 runs sampled)
benchmark - histogram_case125_11buckets_5labels_2100series x 2.21 ops/sec ±2.95% (8 runs sampled)
benchmark - histogram_case126_11buckets_5labels_2100series x 2.27 ops/sec ±2.15% (8 runs sampled)
benchmark - histogram_case127_11buckets_5labels_8400series x 1.72 ops/sec ±39.26% (8 runs sampled)
benchmark - histogram_case128_11buckets_5labels_16800series x 2.23 ops/sec ±2.53% (8 runs sampled)
benchmark - histogram_case129_11buckets_5labels_25200series x 1.92 ops/sec ±35.62% (8 runs sampled)
benchmark - histogram_case130_11buckets_6labels_70series x 0.04 ops/sec ±1.08% (5 runs sampled)
benchmark - histogram_case131_11buckets_6labels_140series x 0.04 ops/sec ±1.01% (5 runs sampled)
benchmark - histogram_case132_11buckets_6labels_280series x 0.04 ops/sec ±3.68% (5 runs sampled)
benchmark - histogram_case133_11buckets_6labels_420series x 0.04 ops/sec ±1.49% (5 runs sampled)
benchmark - histogram_case134_11buckets_6labels_420series x 0.04 ops/sec ±1.98% (5 runs sampled)
benchmark - histogram_case135_11buckets_6labels_2100series x 0.04 ops/sec ±0.59% (5 runs sampled)
benchmark - histogram_case136_11buckets_6labels_2100series x 0.04 ops/sec ±1.84% (5 runs sampled)
benchmark - histogram_case137_11buckets_6labels_8400series x 0.04 ops/sec ±1.83% (5 runs sampled)
benchmark - histogram_case138_11buckets_6labels_16800series x 0.04 ops/sec ±2.87% (5 runs sampled)
benchmark - histogram_case139_11buckets_6labels_25200series x 0.04 ops/sec ±2.85% (5 runs sampled)
benchmark - histogram_case140_11buckets_6labels_75600series x 0.04 ops/sec ±0.39% (5 runs sampled)
benchmark - histogram_case141_11buckets_6labels_126000series x 0.04 ops/sec ±3.18% (5 runs sampled)

@KevinAMurray
Copy link
Author

Hmm. I think there is a bug in the multipliers above so the numbers will be wrong....

@SimenB
Copy link
Collaborator

SimenB commented Sep 19, 2018

Please give 11.1.2 a try 🙂 We can reopen if it's still an issue

@KevinAMurray
Copy link
Author

Thanks. I'm seeing about a double in performance. Very Nice! thanks. The performance is much more linear, which is great.

The benchmark test program is sitting in my fork in the benchmark directory. Shall I PR it into the main branch?

Running as node histogram_benchmark.js -b 1,2 label_1=80 label_2=1 label_3=1 label_4=1 label_5=1 I'm still seeing a penalty for adding labels. I'm using extra labels with a single value to show impact (thought that would normally be handled via Prometheus SD labelling). I'd expect some small performance loss from adding an extra label, but I'd hope for less.

Specifically:
histogram_case001_2buckets_1labels_400series x 808 ops/sec ±1.40% (80 runs sampled)
histogram_case002_2buckets_2labels_400series x 555 ops/sec ±2.59% (74 runs sampled)
histogram_case003_2buckets_3labels_400series x 405 ops/sec ±3.74% (64 runs sampled)
histogram_case004_2buckets_4labels_400series x 333 ops/sec ±3.67% (67 runs sampled)
histogram_case005_2buckets_5labels_400series x 296 ops/sec ±2.22% (65 runs sampled)

Previously (I think 11.1.1), and snipped from different log (hence slight differentces):
histogram_case001_2buckets_1labels_400series x 439 ops/sec ±3.76% (61 runs sampled)
histogram_case003_2buckets_2labels_400series x 323 ops/sec ±5.75% (58 runs sampled)
histogram_case005_2buckets_3labels_400series x 93.73 ops/sec ±21.01% (27 runs sampled)
histogram_case007_2buckets_4labels_400series x 140 ops/sec ±18.33% (37 runs sampled)
histogram_case009_2buckets_5labels_400series x 138 ops/sec ±12.34% (41 runs sampled)
(I'd ignore case005 -- probably background issue on test machine).

I'll try and run some more exhaustive tests on our detailed use cases tonight.

@SimenB
Copy link
Collaborator

SimenB commented Sep 20, 2018

Also see #220

I'd love to add a benchmark to this repo, feel free to PR it!

@KevinAMurray
Copy link
Author

@SimenB, @siimon : I've been tinkering on performance using the benchmarks tools in 11.1.3 (great stuff -- many thanks for them!).

I'm pre-computing the Prometheus label string (mostly) in histogram, and I'm seeing about a 33% performance gain from this at the expense of a 10% performance drop against getMetricsAsJSON and obviously an increase in memory to hold the precomputed labels.

I'd appreciate comments on these trade-offs. Code is currently in https://github.com/KevinAMurray/prom-client/tree/performance-metrics-scrape

(The hash used is very close to the prometheus labels, but it doesn't do the escape string. It may be possible to use the hash instead at a lesser memory overhead. I've yet to see what the performance costs of that would be. Hope to create a branch to check that idea out sometime.)

@KevinAMurray
Copy link
Author

@nowells I've also expanded the benchmarks a bit more now.

@nowells
Copy link
Contributor

nowells commented Oct 2, 2018

@KevinAMurray awesome! I can't wait to see how they evolve. Anything I can do to help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants