Data Structures, Algorithms, & Applications in Java
Hash Functions
Copyright 1999 Sartaj Sahni



Multiplication
Middle Of Square
Folding
Digit Analysis

As remarked in the book, the most popular hash function is division. Some of the other hash functions are described below.

Multiplication
The home bucket for an element e with key k is computed using the function

f(k) = floor(m * FractionalPart(k*A))


Here m is an integer constant and A is a constant real number such that 0 < A < 1. The range of the hash function f is the integers 0, 1, 2, ... m-1.

Donald Knuth's book The art of computer programming::Sorting and searching, volume 3, Addison-Wesley, 1973 suggests using A = (sqrt(5) - 1)/2 = 0.6180339887.

Suppose that the key k is 100, A = 0.6180339887 , and m = 20 . The home bucket is f(k) = floor(20 * FractionalPart(100 * 0.6180339887)) = floor(20 * FractionalPart(61.80339887)) = floor(20 * 0.80339887) = floor(16.0679774) = 16 .

Middle Of Square
In this the key k is squared and the middle p bits used as the home address. Here p is a constant and the hash function range is the integers 0, 1, 2, ..., 2p-1.

Folding
In this method the key is interpreted as an integer using some radix (say 10). The integer is divided into segments, each segment except possibly the last having the same number of digits. These segments are then added to obtain the home address.

As an example, consider the key 76123451001214. Assume we are dividing keys into segments of size 3 digits. The segments for our key are 761, 234, 510, 012, and 14. The home bucket is 761 + 234 + 510 + 012 + 14 = 1531.

In a variant of this scheme, the digits in alternate segments are reversed before adding. This variant is called folding at the boundaries and the original version is called shift folding. Applying the folding at the boundaries method to the above example, the segments after digit reversal are 761, 432, 510, 210, and 14; the home bucket is 761 + 432 + 510 + 210 + 14 = 1927.

Digit Analysis
When the elements that are going to be in the hash table are known in advance, we can analyze the keys and select a subset of the digits to form the home address. The subset is obtained by eliminating those digits whose values are most skewed and hence are less useful in spreading the elements uniformly across the space of bucket addresses.