aboutsummaryrefslogtreecommitdiff
path: root/src/bench/bench.cpp
AgeCommit message (Collapse)Author
2020-08-14bench: Allow skip benchmarkHennadii Stepanov
Co-authored-by: Martin Ankerl <Martin.Ankerl@gmail.com>
2020-06-13Replace current benchmarking framework with nanobenchMartin Ankerl
This replaces the current benchmarking framework with nanobench [1], an MIT licensed single-header benchmarking library, of which I am the autor. This has in my opinion several advantages, especially on Linux: * fast: Running all benchmarks takes ~6 seconds instead of 4m13s on an Intel i7-8700 CPU @ 3.20GHz. * accurate: I ran e.g. the benchmark for SipHash_32b 10 times and calculate standard deviation / mean = coefficient of variation: * 0.57% CV for old benchmarking framework * 0.20% CV for nanobench So the benchmark results with nanobench seem to vary less than with the old framework. * It automatically determines runtime based on clock precision, no need to specify number of evaluations. * measure instructions, cycles, branches, instructions per cycle, branch misses (only Linux, when performance counters are available) * output in markdown table format. * Warn about unstable environment (frequency scaling, turbo, ...) * For better profiling, it is possible to set the environment variable NANOBENCH_ENDLESS to force endless running of a particular benchmark without the need to recompile. This makes it to e.g. run "perf top" and look at hotspots. Here is an example copy & pasted from the terminal output: | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 2.52 | 396,529,415.94 | 0.6% | 25.42 | 8.02 | 3.169 | 0.06 | 0.0% | 0.03 | `bench/crypto_hash.cpp RIPEMD160` | 1.87 | 535,161,444.83 | 0.3% | 21.36 | 5.95 | 3.589 | 0.06 | 0.0% | 0.02 | `bench/crypto_hash.cpp SHA1` | 3.22 | 310,344,174.79 | 1.1% | 36.80 | 10.22 | 3.601 | 0.09 | 0.0% | 0.04 | `bench/crypto_hash.cpp SHA256` | 2.01 | 496,375,796.23 | 0.0% | 18.72 | 6.43 | 2.911 | 0.01 | 1.0% | 0.00 | `bench/crypto_hash.cpp SHA256D64_1024` | 7.23 | 138,263,519.35 | 0.1% | 82.66 | 23.11 | 3.577 | 1.63 | 0.1% | 0.00 | `bench/crypto_hash.cpp SHA256_32b` | 3.04 | 328,780,166.40 | 0.3% | 35.82 | 9.69 | 3.696 | 0.03 | 0.0% | 0.03 | `bench/crypto_hash.cpp SHA512` [1] https://github.com/martinus/nanobench * Adds support for asymptotes This adds support to calculate asymptotic complexity of a benchmark. This is similar to #17375, but currently only one asymptote is supported, and I have added support in the benchmark `ComplexMemPool` as an example. Usage is e.g. like this: ``` ./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800 ``` This runs the benchmark `ComplexMemPool` several times but with different complexityN settings. The benchmark can extract that number and use it accordingly. Here, it's used for `childTxs`. The output is this: | complexityN | ns/op | op/s | err% | ins/op | cyc/op | IPC | total | benchmark |------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:---------- | 25 | 1,064,241.00 | 939.64 | 1.4% | 3,960,279.00 | 2,829,708.00 | 1.400 | 0.01 | `ComplexMemPool` | 50 | 1,579,530.00 | 633.10 | 1.0% | 6,231,810.00 | 4,412,674.00 | 1.412 | 0.02 | `ComplexMemPool` | 100 | 4,022,774.00 | 248.58 | 0.6% | 16,544,406.00 | 11,889,535.00 | 1.392 | 0.04 | `ComplexMemPool` | 200 | 15,390,986.00 | 64.97 | 0.2% | 63,904,254.00 | 47,731,705.00 | 1.339 | 0.17 | `ComplexMemPool` | 400 | 69,394,711.00 | 14.41 | 0.1% | 272,602,461.00 | 219,014,691.00 | 1.245 | 0.76 | `ComplexMemPool` | 600 | 168,977,165.00 | 5.92 | 0.1% | 639,108,082.00 | 535,316,887.00 | 1.194 | 1.86 | `ComplexMemPool` | 800 | 310,109,077.00 | 3.22 | 0.1% |1,149,134,246.00 | 984,620,812.00 | 1.167 | 3.41 | `ComplexMemPool` | coefficient | err% | complexity |--------------:|-------:|------------ | 4.78486e-07 | 4.5% | O(n^2) | 6.38557e-10 | 21.7% | O(n^3) | 3.42338e-05 | 38.0% | O(n log n) | 0.000313914 | 46.9% | O(n) | 0.0129823 | 114.4% | O(log n) | 0.0815055 | 133.8% | O(1) The best fitting curve is O(n^2), so the algorithm seems to scale quadratic with `childTxs` in the range 25 to 800.
2020-04-17bench: Remove requirement that all benches use RegTestingSetupMarcoFalke
2020-04-16scripted-diff: Bump copyright headersMarcoFalke
-BEGIN VERIFY SCRIPT- ./contrib/devtools/copyright_header.py update ./ -END VERIFY SCRIPT-
2020-01-28Fix benchmarks filtersElichai Turkel
2020-01-02test: Show debug log on unit test failureMarcoFalke
2019-12-23rpc: Remove mempool global from minerMarcoFalke
2019-11-06scripted-diff: test: Move setup_common to test libraryMarcoFalke
-BEGIN VERIFY SCRIPT- # Move files for f in $(git ls-files src/test/lib/); do git mv $f src/test/util/; done git mv src/test/setup_common.cpp src/test/util/ git mv src/test/setup_common.h src/test/util/ # Replace Windows paths sed -i -e 's|\\setup_common|\\util\\setup_common|g' $(git grep -l '\\setup_common') sed -i -e 's|src\\test\\lib\\|src\\test\\util\\|g' build_msvc/test_bitcoin/test_bitcoin.vcxproj # Everything else sed -i -e 's|/setup_common|/util/setup_common|g' $(git grep -l 'setup_common') sed -i -e 's|test/lib/|test/util/|g' $(git grep -l 'test/lib/') # Fix include guard sed -i -e 's|BITCOIN_TEST_SETUP_COMMON_H|BITCOIN_TEST_UTIL_SETUP_COMMON_H|g' ./src/test/util/setup_common.h sed -i -e 's|BITCOIN_TEST_LIB_|BITCOIN_TEST_UTIL_|g' $(git grep -l 'BITCOIN_TEST_LIB_') -END VERIFY SCRIPT-
2019-10-30test: Add RegTestingSetup to setup_commonMarcoFalke
2019-05-13[refactor] interfaces: Add missing LockAnnotation for cs_mainMarcoFalke
2019-05-03scripted-diff: replace chainActive -> ::ChainActive()James O'Beirne
Though at the moment ChainActive() simply references `g_chainstate.m_chain`, doing this change now clears the way for multiple chainstate usage and allows us to script the diff. -BEGIN VERIFY SCRIPT- git grep -l "chainActive" | grep -E '(h|cpp)$' | xargs sed -i '/chainActive =/b; /extern CChain& chainActive/b; s/\(::\)\{0,1\}chainActive/::ChainActive()/g' -END VERIFY SCRIPT-
2019-04-11scripted-diff: Bump copyright headers in test, benchMarcoFalke
-BEGIN VERIFY SCRIPT- ./contrib/devtools/copyright_header.py update ./src/bench/ ./contrib/devtools/copyright_header.py update ./src/test/ -END VERIFY SCRIPT-
2019-04-11scripted-diff: Rename test_bitcoin to test/setup_commonMarcoFalke
-BEGIN VERIFY SCRIPT- sed -i --regexp-extended -e 's/test_bitcoin\.(h|cpp)/setup_common.\1/g' $(git grep -l test_bitcoin) git mv ./src/test/test_bitcoin.h ./src/test/setup_common.h git mv ./src/test/test_bitcoin.cpp ./src/test/setup_common.cpp sed -i -e 's/BITCOIN_TEST_TEST_BITCOIN_H/BITCOIN_TEST_SETUP_COMMON_H/g' ./src/test/setup_common.h -END VERIFY SCRIPT-
2019-04-10test: Use test_bitcoin setup in bench, Add test utilsMarcoFalke
2018-07-27Update copyright headers to 2018DrahtBot
2018-04-18benchmark: Removed bench/perf.cppThomas Snider
2018-01-16Log debug build status and warn when running benchmarksWladimir J. van der Laan
Log whether the starting instance of bitcoin core is a debug or release build (--enable-debug). Also warn when running the benchmarks with a debug build, to prevent mistakes comparing debug to non-debug results.
2018-01-03Increment MIT Licence copyright header year on files modified in 2017Akira Takizawa
2017-12-23Improved microbenchmarking with multiple features.Martin Ankerl
* inline performance critical code * Average runtime is specified and used to calculate iterations. * Console: show median of multiple runs * plot: show box plot * filter benchmarks * specify scaling factor * ignore src/test and src/bench in command line check script * number of iterations instead of time * Replaced runtime in BENCHMARK makro number of iterations. * Added -? to bench_bitcoin * Benchmark plotly.js URL, width, height can be customized * Fixed incorrect precision warning
2017-11-16scripted-diff: Replace #include "" with #include <> (ryanofsky)MeshCollider
-BEGIN VERIFY SCRIPT- for f in \ src/*.cpp \ src/*.h \ src/bench/*.cpp \ src/bench/*.h \ src/compat/*.cpp \ src/compat/*.h \ src/consensus/*.cpp \ src/consensus/*.h \ src/crypto/*.cpp \ src/crypto/*.h \ src/crypto/ctaes/*.h \ src/policy/*.cpp \ src/policy/*.h \ src/primitives/*.cpp \ src/primitives/*.h \ src/qt/*.cpp \ src/qt/*.h \ src/qt/test/*.cpp \ src/qt/test/*.h \ src/rpc/*.cpp \ src/rpc/*.h \ src/script/*.cpp \ src/script/*.h \ src/support/*.cpp \ src/support/*.h \ src/support/allocators/*.h \ src/test/*.cpp \ src/test/*.h \ src/wallet/*.cpp \ src/wallet/*.h \ src/wallet/test/*.cpp \ src/wallet/test/*.h \ src/zmq/*.cpp \ src/zmq/*.h do base=${f%/*}/ relbase=${base#src/} sed -i "s:#include \"\(.*\)\"\(.*\):if test -e \$base'\\1'; then echo \"#include <\"\$relbase\"\\1>\\2\"; else echo \"#include <\\1>\\2\"; fi:e" $f done -END VERIFY SCRIPT-
2017-11-09Require a steady clock for bench with at least micro precisionMatt Corallo
2017-11-07bench: switch to std::chrono for time measurementsCory Fields
std::chrono removes portability issues. Rather than storing doubles, store the untouched time_points. Then convert to nanoseconds for display. This allows for maximum precision, while keeping results comparable between differing hardware/operating systems. Also, display full nanosecond counts rather than sub-second floats.
2017-09-11Remove countMaskInv caching in bench frameworkMatt Corallo
We were saving a div by caching the inverse as a float, but this ended up requiring a int -> float -> int conversion, which takes almost as much time as the difference between float mul and div. There are lots of other more pressing issues with the bench framework which probably require simply removing the adaptive iteration count stuff anyway.
2017-08-14Merge #10483: scripted-diff: Use the C++11 keyword nullptr to denote the ↵Wladimir J. van der Laan
pointer literal instead of the macro NULL 90d4d89 scripted-diff: Use the C++11 keyword nullptr to denote the pointer literal instead of the macro NULL (practicalswift) Pull request description: Since C++11 the macro `NULL` may be: * an integer literal with value zero, or * a prvalue of type `std::nullptr_t` By using the C++11 keyword `nullptr` we are guaranteed a prvalue of type `std::nullptr_t`. For a more thorough discussion, see "A name for the null pointer: nullptr" (Sutter & Stroustrup), http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2431.pdf With this patch applied there are no `NULL` macro usages left in the repo: ``` $ git grep NULL -- "*.cpp" "*.h" | egrep -v '(/univalue/|/secp256k1/|/leveldb/|_NULL|NULLDUMMY|torcontrol.*NULL|NULL cert)' | wc -l 0 ``` The road towards `nullptr` (C++11) is split into two PRs: * `NULL` → `nullptr` is handled in PR #10483 (scripted, this PR) * `0` → `nullptr` is handled in PR #10645 (manual) Tree-SHA512: 3c395d66f2ad724a8e6fed74b93634de8bfc0c0eafac94e64e5194c939499fefd6e68f047de3083ad0b4eff37df9a8a3a76349aa17d55eabbd8e0412f140a297
2017-08-07scripted-diff: Use the C++11 keyword nullptr to denote the pointer literal ↵practicalswift
instead of the macro NULL -BEGIN VERIFY SCRIPT- sed -i 's/\<NULL\>/nullptr/g' src/*.cpp src/*.h src/*/*.cpp src/*/*.h src/qt/*/*.cpp src/qt/*/*.h src/wallet/*/*.cpp src/wallet/*/*.h src/support/allocators/*.h sed -i 's/Prefer nullptr, otherwise SAFECOOKIE./Prefer NULL, otherwise SAFECOOKIE./g' src/torcontrol.cpp sed -i 's/tor: Using nullptr authentication/tor: Using NULL authentication/g' src/torcontrol.cpp sed -i 's/METHODS=nullptr/METHODS=NULL/g' src/test/torcontrol_tests.cpp src/torcontrol.cpp sed -i 's/nullptr certificates/NULL certificates/g' src/qt/paymentserver.cpp sed -i 's/"nullptr"/"NULL"/g' src/torcontrol.cpp src/test/torcontrol_tests.cpp -END VERIFY SCRIPT-
2017-07-31Restore default format state of cout after printing with std::fixed/setprecisionpracticalswift
2017-05-13Replace boost::function with std::function (C++11)practicalswift
2017-03-06Merge #9547: bench: Assert that division by zero is unreachableWladimir J. van der Laan
db07f91 Assert that what might look like a possible division by zero is actually unreachable (practicalswift) Tree-SHA512: f1652eb37196a5b72f356503a1fbb44fb98aa8a94954ad1765f86d81ebf41a2337d4eb58c4f19937fda3752f5d2d642756e44afdbd438015b87ac20801246bff
2017-02-07bench: Fix initialization order in registrationWladimir J. van der Laan
The initialization order of global data structures in different implementation units is undefined. Making use of this is essentially gambling on what the linker does, the so-called [Static initialization order fiasco](https://isocpp.org/wiki/faq/ctors#static-init-order). In this case it apparently worked on Linux but failed on OpenBSD and FreeBSD. To create it on first use, make the registration structure local to a function. Fixes #8910.
2017-02-02Assert that what might look like a possible division by zero is actually ↵practicalswift
unreachable
2017-01-05Merge #9281: Refactor: Remove using namespace <xxx> from bench/ & test/ sourcesMarcoFalke
73f4119 Refactoring: Removed using namespace <xxx> from bench/ and test/ source files. (Karl-Johan Alm)
2017-01-02Refactoring: Removed using namespace <xxx> from bench/ and test/ source files.Karl-Johan Alm
2016-12-31Increment MIT Licence copyright header year on files modified in 2016isle2983
Edited via: $ contrib/devtools/copyright_header.py update .
2016-11-22bench: Add support for measuring CPU cyclesWladimir J. van der Laan
This adds cycle min/max/avg to the statistics. Supported on x86 and x86_64 (natively through rdtsc), as well as Linux (perf syscall).
2016-11-22bench: Fix subtle counting issue when rescaling iteration countWladimir J. van der Laan
Make sure that the count is a zero modulo the new mask before scaling, otherwise the next time until a measure triggers will take only 1/2 as long as accounted for. This caused the 'min time' to be potentially off by as much as 100%.
2016-05-30Avoid integer division in the benchmark inner-most loop.Gregory Maxwell
Previously the benchmark code used an integer division (%) with a non-constant in the inner-loop. This is quite slow on many processors, especially ones like ARM that lack a hardware divide. Even on fairly recent x86_64 like haswell an integer division can take something like 100 cycles-- making it comparable to the runtime of siphash. This change avoids the division by using bitmasking instead. This was especially easy since the count was only increased by doubling. This change also restarts the timing when the execution time was very low this avoids mintimes of zero in cases where one execution ends up below the timer resolution. It also reduces the impact of the overhead on the final result. The formatting of the prints is changed to not use scientific notation make it more machine readable (in particular, gnuplot croaks on the non-fixedpoint, and it doesn't sort correctly). This also hoists out all the floating point divisions out of the semi-hot path because it was easy to do so. It might be prudent to break out the critical test into a macro just to guarantee that it gets inlined. It might also make sense to just save out the intermediate counts and times and get the floating point completely out of the timing loop (because e.g. on hardware without a fast hardware FPU like some ARM it will still be slow enough to distort the results). I haven't done either of these in this commit.
2015-10-27[Trivial] ensure minimal header conventionsPhilip Kaufmann
- ensure header namespaces and end comments are correct - add missing header end comments - ensure minimal formatting (add newlines etc.)
2015-09-30Support very-fast-running benchmarksGavin Andresen
Avoid calling gettimeofday every time through the benchmarking loop, by keeping track of how long each loop takes and doubling the number of iterations done between time checks when they take less than 1/16'th of the total elapsed time.
2015-09-30Simple benchmarking frameworkGavin Andresen
Benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark) Wny not use the Google Benchmark framework? Because adding Even More Dependencies isn't worth it. If we get a dozen or three benchmarks and need nanosecond-accurate timings of threaded code then switching to the full-blown Google Benchmark library should be considered. The benchmark framework is hard-coded to run each benchmark for one wall-clock second, and then spits out .csv-format timing information to stdout. It is left as an exercise for later (or maybe never) to add command-line arguments to specify which benchmark(s) to run, how long to run them for, how to format results, etc etc etc. Again, see the Google Benchmark framework for where that might end up. See src/bench/MilliSleep.cpp for a sanity-test benchmark that just benchmarks 'sleep 100 milliseconds.' To compile and run benchmarks: cd src; make bench Sample output: Benchmark,count,min,max,average Sleep100ms,10,0.101854,0.105059,0.103881