Atlanta Custom Software Development 

 
   Search        Code/Page
 

User Login
Email

Password

 

Forgot the Password?
Services
» Web Development
» Maintenance
» Data Integration/BI
» Information Management
Programming
  Database
Automation
OS/Networking
Graphics
Links
Tools
» Regular Expr Tester
» Free Tools

Speed up searches with hash tables
[ All Languages » VB »  Arrays]

Total Hit ( 3045)

Rate this article:     Poor     Excellent 

 Submit Your Question/Comment about this article

Rating


 


You probably know that there are basically two methods to search a value in an array: the brute force approach (i.e. linear searching) and the binary search. Both of them have disadvantages: when the array counts N items, linear searching requires N/2 comparisons on the average for successful searches, and N+1 comparisons for unsuccessful searches; binary search is much more efficient, requiring no more than Log2(N)+1 comparisons, but adds the overhead of keeping the array sorted.

Hash search is a third method that is about as efficient as binary sort, without requiring the added overhead of sorting. You generally build a hash table that counts M items, with M > N, then start adding the original values into this table. To know where each item should go in the hash table, you have to evaluate the so-called hash function, which accepts as an argument the value of the key and returns an Integer value in the range [1,M]. When the key is an integer value, a simple hash function is

Click here to copy the following block
hashCode = (key Mod M) + 1

when the key is a string, we can always convert it to a number, for instance by summing the ASCII codes of individual characters. The following routine evaluates the checksum of a string and works better for our purposes:

Click here to copy the following block
Function Checksum(text As String) As Long
  Dim sum As Long, i As Long
  Dim bArray() As Byte
  ' move to a byte array
  bArray() = StrConv(text, vbFromUnicode)

  ' evaluate the sum of array items
  For i = 0 To UBound(bArray)
    sum = sum + bArray(i) * i
  Next
  Checksum = sum
End Function

Of course, no one can guarantee that all keys deliver different hash codes, therefore we must account for collisions. In order to minimize the number of collisions we can choose M (the size of the hash table) to be at least twice as large as N (the number of array items). Better yet, according to sorting theory M should be a prime number. But whatever value we pick for M, we still have to account for collisions. The simplest method to solve them is the so-called linear probing: when a slot in the hash table is already taken, we try with the following slot, until one free slot is found (since M > N, it is impossible that all slots are occupied). When we reach the last item in the hash table, we wrap-around to the first one, and continue the search.
Say we wish to perform hash searches on a dictionary of ten thousand words, so we start with declaring a hash table of exactly twenty thousand items. While many manuals suggest to store the key data directly in the hash table, we will use that table to store the index to the original data:

Click here to copy the following block
numEls = 10000
ReDim words(numEls) As String
ReDim hashTable(2 * numEls) As Integer
' read words from a file
Open "words.dat" For Input As #1
For index = 1 To numEls
  Line Input#1, words(index)
Next
Close #1

Next, find the slot corresponding to each word in the original array, and store the corresponding index in the hash table

Click here to copy the following block
For index = 1 To numEls
  ' search the correct hash index
  hashIndex = Checksum(words(index))
  ' loop until we find an empty slot
  Do
    hashIndex = (hashIndex Mod numEls) + 1
  Loop While hashTable(hashIndex)
  ' store the index in the hash table
  hashTable(hashIndex) = index
Next

Now we are finally ready to search any string. The search procedure is very similar to the insert procedure:

Click here to copy the following block
search = Text1.Text
hashIndex = Checksum(search)
Do
  hashIndex = (hashIndex Mod numEls) + 1
Loop While hashTable(hashIndex) And words(hashTable(hashIndex)) <> search
If hashTable(hashIndex) Then
  Print "Item found at index " & hashTable(hashIndex)
Else
  Print "Item not found"
End If

Note that these routines rely on the assumption that empty slots in the hashTable() have a null value; this works correctly because the words() array includes the zero-th element, even if it is not used for storing any word, therefore the words(hashTable(hashIndex)) subexpression does not raise any error.
It is easy to confirm that well-designed hash searches are more efficient that any other kinds of searches. Researches show that you can even reduce the number of collisions by incrementing the hashIndex variable by a value K different from one, but which be relative prime to M (this condition is automatically enforced if K < M and M is prime). In the previous example we could set K = 17 (which is relatively prime to 20,000) and replace the statement within both Do...Loop blocks as follows:

Click here to copy the following block
hashIndex = ((hashIndex + 16) Mod numEls) + 1


Submitted By : Nayan Patel  (Member Since : 5/26/2004 12:23:06 PM)

Job Description : He is the moderator of this site and currently working as an independent consultant. He works with VB.net/ASP.net, SQL Server and other MS technologies. He is MCSD.net, MCDBA and MCSE. In his free time he likes to watch funny movies and doing oil painting.
View all (893) submissions by this author  (Birth Date : 7/14/1981 )


Home   |  Comment   |  Contact Us   |  Privacy Policy   |  Terms & Conditions   |  BlogsZappySys

© 2008 BinaryWorld LLC. All rights reserved.