Description
Hi!
As promised, an report on adopting BenchmarkDotNet
to be used as a performance tests runner.
Bad part:
- I've had to write a plenty of code to make it work.
- It's not finished yet: there're some issues preventing us from pushing it into production (listed at the end).
Good part: it finally works and covers almost all of our use cases:)
Lets start with a short intro describing what perftest are and what they are not
At first, the benchmarks and the perftests ARE NOT the same
The difference is like between olympic running shoes and the hiking boots.
There are some similar parts but the use cases are different, obviously:)
Performance tests are not the thing to find the sandbox-winner method. On the contrary, they're aimed to proof that in real-world conditions the code will not break limits set in the test.
As with all other tests, perftests will be run on different machines, under different workload and they still have to produce repeatable results.
This means you cannot use absolute timings to set the limits for perftests.
There's no sense to compare
0.1 sec run on a tablet
with
0.05 sec when run on dedicated testserver
(under same conditions the latter is 10x slower).
So, you have to include some reference (or baseline) method into the benchmark and to compare all other benchmark methods using a relative-to-the-baseline execution time metric.
This approach is known as a competition perf-testing and it is used in all performance tests we do.
Second, you cannot use averages to compare the results.
These result in too optimistic estimates, percentiles to the resque. To be short, some links:
http://www.goland.org/average_percentile_services/
https://msdn.microsoft.com/en-us/library/bb924370.aspx
http://apmblog.dynatrace.com/2012/11/14/why-averages-suck-and-percentiles-are-great/
Also, I'd recommend the Average vs. Percentiles section from the avesome "Writing High-Performance .NET Code" book. To be honest, the entire Chapter 1 does worth the reading.
Third, you have to set BOTH upper and lower limits.
You DO want to detect situations like "Code B runs 1000x faster unexpectedly", believe me.
100 out of 100, the "Code B" was broken somehow.
Fourth, you will have a LOT of the perftests.
Our usual ratio is one perftest per 20 unit tests or so.
That's not a goal, of course, just statistics from our real-world projects.
Let's say, you have a few hundreds of the perftests. This means that they should be FAST. Usual time limit is 5-10 secs for large tests and 1-2 sec for smaller ones. No one will wait for a hour:)
Fifth, all perftests should be auto-annotated.
Yes, there should be option (configurable via appconfig) to collect the statistics and to update the source with it.
Also, benchmark should rerun automatically with a new limits and loose them if they are too tight.
It allows not to bother with run-set limits-repeat loop and boosts the productivity in magnitude. As I've said above, there will be a lot of perftests.
And there should be way to store the annotation as attributes in the code or as a separate xml file. The last one is mandatory in case the tests are auto-generated (yes, we had these).
And the last but not least:
You should not monitor execution time only.
Memory allocations and GC count should be checked too, as they has influence on the entire app performance.
Ook, looks like that's all:)
Ops, one more: the perftests SHOULD be compatible with any unit-testing framework.
Different projects use different testing libraries and it'll be silly to require another one just to run the perftests.
And now, the great news:
Our current perftest implementation covers almost all of the above requirements and it's almost stable enough to be merged into BenchmarkDotNet.
If you're interested in it, of course:)
The code is in https://github.com/rsdn/CodeJam/tree/master/Main/tests-performance/BenchmarkDotNet
The example tests are in https://github.com/rsdn/CodeJam/tree/master/Main/tests-performance/CalibrationBenchmarks
aaand its kinda working 🆒
The main show stopper
We need an ability to use a custom toolchain.
It looks like it will allow us to enable in-process test running much faster than waiting for #140 to be closed:)
Also, I've delayed the implementation of memory-related limits until we will be sure all other parts are working fine. We definitely need an ability to collect the GC statistics directly from the benchmark process.
It'll allow us to use same System.GC API we're using for the monitoring in production.
When all of it'll will be done I'm going to create a discussion about merging the competition tests infrastructure into BenchmarkDotNet.
At the end
A list of the things that are not critical but definitely should be included into Bench.Net codebase:
-
Percentile and scaled percentile columns.
-
The API to group Summary' benchmarks by same conditions (same job and same parameters).
Use case: we've benchmark with different[Params()]
and there's no sense to compare
results fromCount = 100
and result fromCount = 1000
.
You already have similar check in the BaselineDiffColumn,var baselineBenchmark = summary.Benchmarks. Where(b => b.Job.GetFullInfo() == benchmark.Job.GetFullInfo()). Where(b => b.Parameters.FullInfo == benchmark.Parameters.FullInfo). FirstOrDefault(b => b.Target.Baseline);
I propose to extract it in into public API, something like
/// <summary> /// Groups benchmarks being run under same conditions (job+parameters) /// </summary> public static ILookup<KeyValuePair<IJob, ParameterInstances>, Benchmark> SameConditionBenchmarks(this Summary summary) => summary.Benchmarks.ToLookup(b => new KeyValuePair<IJob, ParameterInstances>(b.Job, b.Parameters));
-
API to get
BenchmarkReport
fromSummary
andBenchmark
. There wasSummary.Reports
in 0.9.3, but in 0.9.5 its type was changed from Dictionary<> to the array. -
Ability to report benchmark errors from the analysers. Use case: unit test analyser should report error if the perf test does not fit into timing limits.
Currently i just throw an exception but it does not fit well into the design of the BenchmarkDotNet.
Whoa! That's all for now
Any questions / suggestions are welcome:)