Hashing (?) 8-)
Stacey Campbell
stacey at hcr.UUCP
Sun Sep 11 04:09:16 AEST 1988
Michael Steiner writes:
>I don't know what hashing is. Could someone tell me a little about what it is.
I remember when hashing was first explained to me at university...
Lecturer: As you are all aware searching for an item in a linear list
is an order N operation.
Students: Groan. Kids stuff.
Lecturer: I am going to discuss today a method of storing and retrieving
information that is much faster than this. Storing and retrieving an
item from a binary tree can be done as a order log N operation.
Captive Audience: Not trees again!
Lecturer: The method I will discuss can be achieved with order one
operations.
Audience (agog): What? In your dreams perhaps!
Lecturer: First we select a field from the item to be stored, it
doesn't really matter what the data type of the field is, but for
the purposes of this discussion it will be an integer...
Audience: (cries of 'Rigged!')
Lecturer: ...then we use it as a seed for a pseudo uniform random
number generator. The random number that is generated is used
as an index into a table where the item is then stored. Providing
the index fields are different for each item then the items will be
stored evenly throughout the table.
Students: What's the use of that? The data is now stored at some random
spot in the table and it will need an order N search to get to it
again.
Lecturer: To access an item in order 1 you merely give its index
field to the random number generator, which returns the same index
returned when you originally stored the item. You then look up the
table using the index.
Students: (getting very excitable) What happens if the table fills up?
What if the random number generator returns the same value for two
different index fields? Where does hashing come into this?
What a stupid idea.
Lecturer: This whole process is called 'hashing'. The table is called
a 'hash table' and the random number generator is called a 'hash
function', other types of strategies can be used to generate hash
numbers. Obviously it is going to be hard to design a hash function
that will generate completely uniform hash numbers across the hash
table. At some stage an item will be hashed to a spot that is already
used. This is called a 'collision' and a number of strategies can
be used for 'collision resolution'.
If you are sure that there are more empty entries in the table than
items to be stored you can keep re-hashing the index field until you
reach an empty spot in the table.
If it is possible to completely fill up the table then you can
store items in a linear list at the table entry they hash to. So
looking up an entry would involve hashing the index field, then
doing a linear search for the correct item in the list stored at
that table entry. You are not limited to using a linear list either,
you can implement the storage for items that have collided as a
binary tree, or even as another hash table.
Students: Unbelievable. What will they think of next?
--
{lsuc,utzoo,utcsri}!hcr!hcrvax!stacey Stacey Campbell, HCR Corporation,
130 Bloor St W, Toronto, Ontario, Canada. +1 416 922 1937 X48
More information about the Comp.lang.c
mailing list