math - Why using powers of 2 as the hash size makes a hash table considerably worse than using primes? -
I am implementing a hash table that is supposed to store the sum of 32-bit values to my The size fixed by keeping the elements in mind, I am using a very simple hashing function:
hash (a, b) = asUint64 (a) + (asUint64 (b )
With it, I have an indicator of element in a hash table (which is its associated bucket):
Index (A, B)) = Hash (A, B)% Whereas the hash_size is the number of entries / buckets on my table. I have realized that however, I can give this implementation a little speed if I use the "modulus" operator to do the 2 As the power changed the hash_size by fixing it. Except, when I do this, most of my couple ends up on the bucket first! Why is this happening?
My guess is that your data is not distributed evenly in a
Consider a
and b
inclusion as your hash code:
b31b30 ... b1b0a31a30 ... A1a0, where ai, bi A, b
estimate that you have a table with one million entries, then your hash index then
A9a8 ... a1a0 (as an integer)
Worse, looks like a
has ranges from 1 to 100 only It is You also have less dependency on a higher order bits .
As you can see, if there is not at least 4 billion entries in your hash table, then there is no dependency on B on your Hashod, and hash (x, a) < / Code> all will be colliding for x
.
Comments
Post a Comment