Oct 11, 2010

Hash Function Algorithms and Java hashCode() method and HashMap.

Hash functions are by definition and implementation, pseudo random number generators. In general there is a theoretical hash function known as the perfect hash function for any group of data. The perfect hash function by definition states that no collisions will occur meaning no repeating hash values will arise from different elements of the group. In reality its very difficult to find a perfect hash function and the practical applications of perfect hashing and its variant minimal perfect hashing are quite limited. In practice it is generally recognized that a perfect hash function is the hash function that produces the least amount of collisions for a particular set of data. 

Fast table lookup can be implemented using a hash function and a hash table. Elements are found in the hash table by calculating the hash of the element's key and using the hash value as the index into the table. This is clearly faster than other methods, such as examining each element of the table sequentially to find a match. And also, other than this, there are some other usages of the hash function generally, such as, encrypting a message to unreadable by third parties in Cryptography, etc.

In Java, A common hashing scheme uses an array where each element is a list of items. The array elements are called buckets. Operations in a hashing scheme involve computing an arrayindex from an item. Converting an item to its array index is done by a hash function The array index returned by the hash function is called the hash value of the item. The hash value identifies a particular bucket. Storing an item involves the following steps:
1. Hashing the item to determine the bucket.
2. If the item does not match one already in the bucket, it is stored in the bucket.
Note that no duplicate items are stored. Retrieving an item is based on using a key. The key represents the identity of the item. Item retrieval is also a two-step process:
1. Hashing the key to determine the bucket.
2. If the key matches an item in the bucket, this item is retrieved from the bucket.
Different items can hash to the same bucket, meaning that the hash function returns the same hash value for these items. This condition is called a collision. The list maintained by a bucket contains the items that hash to the bucket.
The hash value only identifies the bucket. Finding an item in the bucket entails a search and requires an equality function to compare items. The items maintained in a hash-based storage scheme must, therefore, provide two essential functions: a hash function and an equality function. The performance of a hashing scheme is largely affected by how well the hash function distributes a collection of items over the available buckets. A hash function should not be biased toward any particular hash values. An ideal hash function produces a uniform distribution of hash values for a collection of items across all possible hash values. Such a hash function is not an easy task to design. Fortunately, heuristics exist for constructing adequate hash functions.



No comments: