diff options
author | Wladimir J. van der Laan <laanwj@gmail.com> | 2017-06-24 11:27:59 +0200 |
---|---|---|
committer | Wladimir J. van der Laan <laanwj@gmail.com> | 2017-06-24 11:28:04 +0200 |
commit | 232508fe0fcb9ab33739a18cde8b0d5bfe4b4676 (patch) | |
tree | 255224922c6d6e1255436543c8fdf75bb3568451 /src | |
parent | e0a7801223fd573863939e76cb633f1dcc2d22c4 (diff) | |
parent | dd869c60ca069fa3eea3dd1aab977b8a10e05f2f (diff) |
Merge #10577: Add an explanation of quickly hashing onto a non-power of two range.
dd869c6 Add an explanation of quickly hashing onto a non-power of two range. (Gregory Maxwell)
Tree-SHA512: 8b362e396206a4ee2e825908dcff6fe4525c12b9c85a6e6ed809d75f03d42edcfba5e460a002e5d17cc70c103792f84d99693563b638057e4e97946dd1d800b2
Diffstat (limited to 'src')
-rw-r--r-- | src/cuckoocache.h | 31 |
1 files changed, 31 insertions, 0 deletions
diff --git a/src/cuckoocache.h b/src/cuckoocache.h index 2e66901b35..fd24d05ee7 100644 --- a/src/cuckoocache.h +++ b/src/cuckoocache.h @@ -206,6 +206,37 @@ private: /** compute_hashes is convenience for not having to write out this * expression everywhere we use the hash values of an Element. * + * We need to map the 32-bit input hash onto a hash bucket in a range [0, size) in a + * manner which preserves as much of the hash's uniformity as possible. Ideally + * this would be done by bitmasking but the size is usually not a power of two. + * + * The naive approach would be to use a mod -- which isn't perfectly uniform but so + * long as the hash is much larger than size it is not that bad. Unfortunately, + * mod/division is fairly slow on ordinary microprocessors (e.g. 90-ish cycles on + * haswell, ARM doesn't even have an instruction for it.); when the divisor is a + * constant the compiler will do clever tricks to turn it into a multiply+add+shift, + * but size is a run-time value so the compiler can't do that here. + * + * One option would be to implement the same trick the compiler uses and compute the + * constants for exact division based on the size, as described in "{N}-bit Unsigned + * Division via {N}-bit Multiply-Add" by Arch D. Robison in 2005. But that code is + * somewhat complicated and the result is still slower than other options: + * + * Instead we treat the 32-bit random number as a Q32 fixed-point number in the range + * [0,1) and simply multiply it by the size. Then we just shift the result down by + * 32-bits to get our bucket number. The results has non-uniformity the same as a + * mod, but it is much faster to compute. More about this technique can be found at + * http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ + * + * The resulting non-uniformity is also more equally distributed which would be + * advantageous for something like linear probing, though it shouldn't matter + * one way or the other for a cuckoo table. + * + * The primary disadvantage of this approach is increased intermediate precision is + * required but for a 32-bit random number we only need the high 32 bits of a + * 32*32->64 multiply, which means the operation is reasonably fast even on a + * typical 32-bit processor. + * * @param e the element whose hashes will be returned * @returns std::array<uint32_t, 8> of deterministic hashes derived from e */ |