Our first numerical experiments focused on the evaluation of Lyapunov exponents for a variety of iterators and initial hash functions, including variations of linear and quadratic probing, and double hashing. The results of these experiments were, not surprisingly, inconclusive. The relationship between iterator and Lyapunov exponent appeared to be a complex one. All commonly used hash iterators had an integer Lyapunov exponent which depended on table size, yet in many cases the exponent was not directly related to the actual performance of the function for random key values. In some cases a higher integer Lyapunov exponent was associated with a poorer performing hash function. It was concluded that the evaluation of integer Lyapunov exponent alone was not a sufficient measure of hash function performance.
An analytical evaluation of double hashing provides some insight. We can rewrite equations (5) and (6) as
where N and N-2 are prime numbers. As discussed in Section 2.2,
this function produces unique probe sequences for each unique
value of
, and all probe sequences will be of full length.
An estimate of the integer Lyapunov exponent can be determined analytically. If
one starts with an initial key
, one can analytically
perform the summation in equation (10). First, observe that
the expressions
and
are both equal to
the original key k for k<N-3. In this case, hash
function (12) reduces to
Evaluating the individual terms of the sum for equation (10) yields
Clearly it is difficult to bound this expression because of the
modular reduction operations. However,
rough bounds can be established by noting that only when
and
are in different epochs modulo N will the difference
between the values be greater than (i+2). Furthermore, for many values
of i > N/2 the distance
will be substantially less than
i because the distance is measured modulo N. Overall, a very rough
expected value for the integer error distance
is of order i
for the i-th iteration of this hash function:
where
denotes the expected value of the Lyapunov
exponent for m iterations, starting with key
.
This expectation is easily verified empirically. For example, for N=1823
the measured Lyapunov exponent is 6.00 versus 6.50 predicted as an
upper bound above, measured over the entire problem space.
This rough bound is sufficient for the analysis.
An important observation can be drawn from this analysis. Recall that the
Lyapunov exponent is a measure of sensitivity to initial conditions. It tells us
how quickly data initially close together
will be distributed widely in the hash table. From the analysis of the
double hashing function, it can be seen that values that start close in the
table will differ by no more than about
slots in the table after
iteration i. This is a significant result if one would like
collisions resolved after only a few probe iterations. Consider that
even for large tables with high load factors, the average number of
probes rarely exceeds 30. Therefore data elements which are
clustered initially will likely remain clustered for the first
several iterations. This has significant implications for the performance
of double hashing of nonuniformly distributed data. A hash function
which distributes the data uniformly in the hash table from the very first
iteration more closely approximates the uniform hashing property.
The problem described above
is inherent because of the linearity modulo N of the most
popular choices for hash functions. Because of the linear relationship
modulo N between i and
in double hashing (see equation (4))
close key values are separated little during the first probe iterations,
unless they happen to cross a modulo N boundary. Correcting this
problem requires either the use of nonlinear hash functions modulo
N for
and
or a nonlinear probe function modulo N.