math - Why using powers of 2 as the hash size makes a hash table considerably worse than using primes? -


I am implementing a hash table that is supposed to store the sum of 32-bit values ​​to my The size fixed by keeping the elements in mind, I am using a very simple hashing function:

  hash (a, b) = asUint64 (a) + (asUint64 (b )  

With it, I have an indicator of element in a hash table (which is its associated bucket):

  Index (A, B)) = Hash (A, B)% Whereas the hash_size is the number of entries / buckets on my table. I have realized that however, I can give this implementation a little speed if I use the "modulus" operator to do the 2 As the power changed the hash_size by fixing it.  Except, when I do this, most of my couple ends up on the bucket first!  Why is this happening?   

My guess is that your data is not distributed evenly in a Consider a and b inclusion as your hash code:

  b31b30 ... b1b0a31a30 ... A1a0, where ai, bi A, b  

estimate that you have a table with one million entries, then your hash index then

  A9a8 ... a1a0 (as an integer)  

Worse, looks like a has ranges from 1 to 100 only It is You also have less dependency on a higher order bits .

As you can see, if there is not at least 4 billion entries in your hash table, then there is no dependency on B on your Hashod, and hash (x, a) < / Code> all will be colliding for x .


Comments

Popular posts from this blog

winforms - C# Form - Property Change -

javascript - amcharts makechart not working -

java - Algorithm negotiation fail SSH in Jenkins -