diff options
author | Wladimir J. van der Laan <laanwj@gmail.com> | 2018-07-09 20:33:41 +0200 |
---|---|---|
committer | Wladimir J. van der Laan <laanwj@gmail.com> | 2018-07-09 21:17:18 +0200 |
commit | 3a3eabef40979b5b136b8bd81a65c228c8b8895d (patch) | |
tree | f4afb0a04507acf4b330965b8f65f27b39eea77e /src/core_write.cpp | |
parent | 7e74c54fed364a2974b6033da12de65abc07df93 (diff) | |
parent | 66b2cf1ccfad545a8ec3f2a854e23f647322bf30 (diff) |
Merge #13386: SHA256 implementations based on Intel SHA Extensions
66b2cf1ccfad545a8ec3f2a854e23f647322bf30 Use immintrin.h everywhere for intrinsics (Pieter Wuille)
4c935e2eee456ff66cdfb908b0edffdd1e8a6c04 Add SHA256 implementation using using Intel SHA intrinsics (Pieter Wuille)
268400d3188200c9e3dcd3482c4853354388a721 [Refactor] CPU feature detection logic for SHA256 (Pieter Wuille)
Pull request description:
Based on #13191.
This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.
In addition to #13191, two extra implementations are provided:
* (a) A variable-length SHA256 implementation using SHA extensions.
* (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.
Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:
* Using generic C++ code (pre-#10821): 6.1ms
* Using SSE4 (master, #10821): 4.6ms
* Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
* Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
* Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms
Benchmarks for 32-byte SHA256 on the same system:
* Using SSE4 (master, #10821): 190ns
* Using SHA-NI (this PR): 53ns
Benchmarks for 1000000-byte SHA256 on the same system:
* Using SSE4 (master, #10821): 2.5ms
* Using SHA-NI (this PR): 0.51ms
Tree-SHA512: 2b319e33b22579f815d91f9daf7994a5e1e799c4f73c13e15070dd54ba71f3f6438ccf77ae9cbd1ce76f972d9cbeb5f0edfea3d86f101bbc1055db70e42743b7
Diffstat (limited to 'src/core_write.cpp')
0 files changed, 0 insertions, 0 deletions