Data Structures, Algorithms, & Applications in Java
Hash Functions
Copyright 1999 Sartaj Sahni
Multiplication
Middle Of Square
Folding
Digit Analysis
As remarked in the book, the most popular hash function is division.
Some of the other hash functions are described below.
Multiplication
The home bucket for an element e
with key k
is computed using the function
f(k) = floor(m * FractionalPart(k*A))
Here m
is an integer constant
and A
is a constant real number such that
0 < A < 1
.
The range of the hash function f
is
the integers 0, 1, 2, ...
m-1
.
Donald Knuth's book The art of computer programming::Sorting
and searching
, volume 3, Addison-Wesley, 1973 suggests using
A = (sqrt(5) - 1)/2 = 0.6180339887
.
Suppose that the key k
is 100
,
A =
0.6180339887
, and m = 20
. The home bucket is
f(k) = floor(20 * FractionalPart(100 * 0.6180339887))
= floor(20 * FractionalPart(61.80339887)) = floor(20 * 0.80339887)
= floor(16.0679774) = 16
.
Middle Of Square
In this the key k
is squared and the
middle p
bits used as the home address.
Here p
is a constant and the hash function range
is the integers 0, 1, 2, ..., 2p-1
.
Folding
In this method the key is interpreted as an integer
using some radix (say 10
).
The integer is divided into segments,
each segment except possibly the last having the same number of digits.
These segments are then added to obtain the home address.
As an example, consider the key 76123451001214
.
Assume we are
dividing keys into segments of size 3
digits.
The segments for our
key are 761, 234, 510, 012
, and
14
. The home bucket is
761 + 234 + 510 + 012 + 14 = 1531
.
In a variant of this scheme, the digits in alternate segments
are reversed before adding. This variant is called folding
at the boundaries and the original version is called
shift folding.
Applying the folding at the boundaries method to the above example,
the segments after digit reversal are
761, 432, 510, 210
, and 14
;
the home bucket is 761 + 432 + 510 + 210 + 14 = 1927
.
Digit Analysis
When the elements that are going to be in the hash table are known
in advance, we can analyze the keys and select a subset of the digits
to form the home address. The subset is obtained by eliminating those digits
whose values are most skewed and hence are less useful in spreading
the elements uniformly across the space of bucket addresses.