BenchmarkDotNet as a performance tests runner.

Hi!

As promised, an report on adopting `BenchmarkDotNet` to be used as a performance tests runner.

**Bad part:** 
- I've had to write a plenty of code to make it work.
- It's not finished yet: there're some issues preventing us from pushing it into production (listed at the end).

**Good part:** it finally works and covers almost all of our use cases:)
#### Lets start with a short intro describing what perftest are and what they are not
##### At first, the benchmarks and the perftests ARE NOT the same

The difference is like between olympic running shoes and the hiking boots.
There are some similar parts but the use cases are different, obviously:)

Performance tests are not the thing to find the sandbox-winner method. On the contrary, they're aimed to proof that in real-world conditions the code will not break limits set in the test.
As with all other tests, perftests will be run on different machines, under different workload  and they still have to produce repeatable results.

This means you cannot use absolute timings to set the limits for perftests.
There's no sense to compare 
`0.1 sec run on a tablet` 
with
 `0.05 sec when run on dedicated testserver`
 (under same conditions the latter is 10x slower).

So, you have to include some reference (or **baseline**) method into the benchmark and to compare all other benchmark methods using a relative-to-the-baseline execution time metric.
This approach is known as a **competition perf-testing** and it is used in all performance tests we do.
##### Second, you cannot use averages to compare the results.

These result in too optimistic estimates, percentiles to the resque. To be short, some links:
http://www.goland.org/average_percentile_services/
https://msdn.microsoft.com/en-us/library/bb924370.aspx
http://apmblog.dynatrace.com/2012/11/14/why-averages-suck-and-percentiles-are-great/

Also, I'd recommend the [Average vs. Percentiles](https://books.google.ru/books?id=yWpbBAAAQBAJ&pg=PT27&dq=ben+watson+average+vs+percentiles) section from the avesome "Writing High-Performance .NET Code" book. To be honest, the entire Chapter 1 does worth the reading.
##### Third, you have to set BOTH upper and lower limits.

You DO want to detect situations like "Code B runs 1000x faster unexpectedly", believe me.
100 out of 100, the "Code B" was broken somehow.
##### Fourth, you will have a LOT of the perftests.

Our usual ratio is one perftest per 20 unit tests or so.
That's not a goal, of course, just statistics from our real-world projects.

Let's say, you have a few hundreds of the perftests. This means that they should be FAST. Usual time limit is 5-10 secs for large tests and 1-2 sec for smaller ones. No one will wait for a hour:)
##### Fifth, all perftests should be auto-annotated.

Yes, there should be option (configurable via appconfig) to collect the statistics and to update the source with it.
Also, benchmark should rerun automatically with a new limits and loose them if they are too tight.
It allows not to bother with run-set limits-repeat loop and boosts the productivity in magnitude. As I've said above, there will be a lot of perftests. 

And there should be way to store the annotation as attributes in the code or as a separate xml file. The last one is mandatory in case the tests are auto-generated (yes, we had these).
##### And the last but not least:

 You should not monitor execution time only.
Memory allocations and GC count should be checked too, as they has influence on the entire app performance.

Ook, looks like that's all:)

**Ops, one more:** the perftests SHOULD be compatible with any unit-testing framework.
Different projects use different testing libraries and it'll be silly to require another one just to run the perftests.
#### And now, the great news:

Our current perftest implementation covers almost all of the above requirements and it's almost stable enough to be merged into BenchmarkDotNet.

If you're interested in it, of course:)
The code is in https://github.com/rsdn/CodeJam/tree/master/Main/tests-performance/BenchmarkDotNet
The example tests are in https://github.com/rsdn/CodeJam/tree/master/Main/tests-performance/CalibrationBenchmarks
aaand its kinda working :cool:
#### The main show stopper

**We need an ability to use a custom toolchain.**
It looks like it will allow us to enable in-process test running much faster than waiting for #140 to be closed:)

Also, I've delayed the implementation of memory-related limits until we will be sure all other parts are working fine. We definitely need an ability to collect the GC statistics directly from the benchmark process.
It'll allow us to use same System.GC API we're using for the monitoring in production.

When all of it'll will be done I'm going to create a discussion about merging the competition tests infrastructure into BenchmarkDotNet.
#### At the end

A list of the things that are not critical but definitely should be included into Bench.Net codebase:
1. Percentile and scaled percentile columns.
2. The API to group Summary' benchmarks by same conditions (same job and same parameters).
   Use case: we've benchmark with different `[Params()]` and there's no sense to compare
   results from `Count = 100` and result from `Count = 1000`.
   You already have similar check in the BaselineDiffColumn,
   
   ```
              var baselineBenchmark = summary.Benchmarks.
                 Where(b => b.Job.GetFullInfo() == benchmark.Job.GetFullInfo()).
                 Where(b => b.Parameters.FullInfo == benchmark.Parameters.FullInfo).
                 FirstOrDefault(b => b.Target.Baseline);
   ```
   
   I propose to extract it in into public API, something like
   
   ```
         /// <summary>
         /// Groups benchmarks being run under same conditions (job+parameters)
         /// </summary>
         public static ILookup<KeyValuePair<IJob, ParameterInstances>, Benchmark> SameConditionBenchmarks(this Summary summary)
             => summary.Benchmarks.ToLookup(b => new KeyValuePair<IJob, ParameterInstances>(b.Job, b.Parameters));
   ```
3. API to get `BenchmarkReport` from `Summary` and `Benchmark`. There was `Summary.Reports` in 0.9.3, but in 0.9.5 its type was changed from Dictionary<> to the array.
4. Ability to report benchmark errors from the analysers. Use case: unit test analyser should report error if the perf test does not fit into timing limits.
   Currently i just throw an exception but it does not fit well into the design of the BenchmarkDotNet.
#### Whoa! That's all for now

Any questions / suggestions are welcome:)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BenchmarkDotNet as a performance tests runner. #155

Lets start with a short intro describing what perftest are and what they are not

At first, the benchmarks and the perftests ARE NOT the same

Second, you cannot use averages to compare the results.

Third, you have to set BOTH upper and lower limits.

Fourth, you will have a LOT of the perftests.

Fifth, all perftests should be auto-annotated.

And the last but not least:

And now, the great news:

The main show stopper

At the end

Whoa! That's all for now

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BenchmarkDotNet as a performance tests runner. #155

Description

Lets start with a short intro describing what perftest are and what they are not

At first, the benchmarks and the perftests ARE NOT the same

Second, you cannot use averages to compare the results.

Third, you have to set BOTH upper and lower limits.

Fourth, you will have a LOT of the perftests.

Fifth, all perftests should be auto-annotated.

And the last but not least:

And now, the great news:

The main show stopper

At the end

Whoa! That's all for now

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions