Hash Tables

  • Recall the map ADT
  • Intuitively, a map M supports the abstraction of using keys as indices such as M[k]
  • A map with n keys that are known to be integers in a fixed range is just an array
  • A hash function can map general keys (ie not integers) to corresponding indices in a table/array

Hash Functions

A hash function maps keys of a given type to integers in a fixed interval .

  • A very simple hash function is the mod function:

    • Works for integer keys
    • The integer is the hash value of the key
  • The goal of a hash function is to store an entry at index

  • Function usually has two components:

    • Hash code
      • keys -> integers
    • Compression function
      • integers -> integers in range
    • Hash code applied first, then compression - Some example hash functions:
  • Memory address

    • Use the memory address of the object as it's hash code
  • Integer cast

    • Interpret the bits of the key as an integer
    • Only suitable with 64 bits
  • Component sum

    • Partition they key into bitwise components of fixed length and sum the components
  • Polynomial accumulation

    • Partition the bits of the key into a sequence of components of fixed length , , ... ,
    • Evaluate the polynomial for some fixed value
    • Especially suitable for strings
    • Polynomial can be evaluated in time as

Some example compression functions:

  • Division
    • The size is usually chosen to be a prime to increase performance
  • Multiply, Add, and Divide (MAD)
    • and are nonnegative integers such that

Collision Handling

Collisions occur when different keys hash to the same cell. There are several strategies for resolving collisions.

Separate Chaining

With separate chaining, each cell in the map points to another map containing all the entries for that cell.

Linear Probing

  • Open addressing
    • The colliding item is placed in a different cell of the table
  • Linear probing handles collisions by placing the colliding item at the next available table cell
  • Each table cell inspected is referred to as a "probe"
  • Colliding items can lump together, causing future collisions to cause a longer sequence of probes

Consider a hash table that uses linear probing.

  • get(k)
    • Start at cell
    • Prove consecutive locations until either
      • Key is found
      • Empty cell is found
      • All cells have been unsuccessfully probed
  • To handle insertions and deletions, need to introduce a special marker object defunct which replaces deleted elements
  • remove(k)
    • Search for an entry with key k
    • If an entry (k, v) is found, replace it with defunct and return v
    • Else, return null

Double Hashing

  • Double hashing uses two hash functions h() and f()
  • If cell h(k) already occupied, tries sequentially the cell for
  • f(k) cannot return zero
  • Table size must be a prime to allow probing of all cells
  • Common choice of second hash func is where q is a prime
  • if then we have linear probing

Performance

  • In the worst case, operations on hash tables take time when the table is full and all keys collide into a single cell
  • The load factor affects the performance of a hash table
    • = number of entries
    • = number of cells
  • When is large, collision is likely
  • Assuming hash values are true random numbers, the "expected number" of probes for an insertion with open addressing is
  • However, in practice, hashing is very fast and operations have performance, provided is not close to 1