If we are next searching for "BP" we would hash to INDEX 11
and find "BC" there instead. Are we to assume that "BP" was never
put into the table because it is not there? We cannot process
our information directly if a collision results : we must resolve
it.
When building the table with a key of "BC" we would insert
it at index 11. Later we would hash "BP" and try to insert it
into 11 as well.
The easiest method of collision resolution is to use the next
free location after our hashed location. If hash to I and it
is full, linearly search for next free slot perhaps at I + 1.
We must use in this case index = 12.
When we now search for an item we must check the hash location plus subsequent elements in a circular manner until we find the item or a free slot. If we find a free slot then the item was not put into the table, else we find it. Unfortunately we are now back to a linear search!!!
As with any normal search, we must ensure that we do not search endlessly! What if no free slots are available?
Linear collision resolution has an important flaw: table entries tend to cluster together as the table fills up. This causes a long linear search (which is what we don't want). This is called a cluster. Since a bigger cluster is a bigger target, the odds are that if you hit a cluster, you will hit a big one. This adds to the building time due to collisions and to the search time due to linear search.
One solution is to not use linear resolution but a
quadratic resolution solution :
build/search for keys at H,H+1,H+4,H+9, ....H+(I-squared)
Because of this a symbol table is normally constructed
using hashing but the hash index produced IS NOT used to store
the data but a pointer to a linked list containing the data.
An array is limited in size to what was set at compile time but a linked
list can grow : this is also our collision resolution.
If not null then another key has hashed to the same index and a linked list is already present. We will create a new node and add it to the linked list. Should we add it to the end of the linked list or to the beginning?
Create a new node, set its info field, set its link field
to point to the first node and the pointer in the table to point
to this new node. Add to the start of the linked list. Why?
procedure fish
var i;
procedure fish2
var i;
begin
i![]()
end;
NOTE: (1) Our symbol table can grow very large
(2) No entry is special case (as always)
(3) Collision is resolved but at the
expense of linear search of linked list.
(4) Could use binary search tree structure
instead of Linked List to speed up searching.
Suppose there are N symbols and the table has T slots. Then to find a symbol we need to perform N/(2*T) comparisons on average. Note that to reduce this number, simply increase T. (Trade storage space for speed).
1000 symbols : table size T = 337 initial pointers (337 is prime) -> 2-3 comparisons on average.
How many comparisons for linear search of 1000 items?
How many comparisons for binary search of 1000 (sorted) items?
Linear Search :
Binary Search :
Hashing Search: