ConcurrentHashMap implementation

As suggested earlier, the data structure used by ConcurrentHashMap is similar in implementation to that of Hashtable or HashMap, with a resizable array of hash buckets, each consisting of a chain of Map.Entry elements, as shown in Listing 1. Instead of a single collection lock, ConcurrentHashMap uses a fixed pool of locks that form a partition over the collection of buckets.
Listing 1. Map.Entry elements used by ConcurrentHashMap
protected static class Entry implements Map.Entry {
    protected final Object key;
    protected volatile Object value;
    protected final int hash;
    protected final Entry next;
    ...
}

Traversing data structures without locking

Unlike Hashtable or a typical lock-pool Map implementation, ConcurrentHashMap.get() operations do not necessarily entail acquiring the lock associated with the relevant bucket. In the absence of locking, the implementation must be prepared to deal with stale or inconsistent values of any variables it uses, such as the list head pointer and the fields of the Map.Entry elements (including the link pointers that comprise the linked list of entries for each hash bucket).
Most concurrent classes use synchronization to ensure exclusive access to -- and a consistent view of -- a data structure. Instead of assuming exclusivity and consistency, the linked list used by ConcurrentHashMap is designed carefully so that the implementation can detectthat its view of the list is inconsistent or stale. If it detects that its view is inconsistent or stale, or simply does not find the entry it is looking for, it then synchronizes on the appropriate bucket lock and searches the chain again. This optimizes lookup in the common case -- where most retrievals are successful and retrievals outnumber insertions and removals.

Exploiting immutability

One significant source of inconsistency is avoided by making the Entry elements nearly immutable -- all fields are final, except for the value field, which is volatile. This means that elements cannot be added to or removed from the middle or end of the hash chain -- elements can only be added at the beginning, and removal involves cloning all or part of the chain and updating the list head pointer. So once you have a reference into a hash chain, while you may not know whether you have a reference to the head of the list, you do know that the rest of the list will not change its structure. Also, since the value field is volatile, you will be able to see updates to the value field immediately, greatly simplifying the process of writing a Map implementation that can deal with a potentially stale view of memory.
While the new JMM provides initialization safety for final variables, the old JMM does not, which means that it is possible for another thread to see the default value for a final field, rather than the value placed there by the object's constructor. The implementation must be prepared to detect this as well, which it does by ensuring that the default value for each field of Entry is not a valid value. The list is constructed such that if any of the Entry fields appear to have their default value (zero or null), the search will fail, prompting the get() implementation to synchronize and traverse the chain again.

Retrieval operations

Retrieval operations proceed by first finding the head pointer for the desired bucket (which is done without locking, so it could be stale), and traversing the bucket chain without acquiring the lock for that bucket. If it doesn't find the value it is looking for, it synchronizes and tries to find the entry again, as shown in Listing 2:
Listing 2. ConcurrentHashMap.get() implementation
  public Object get(Object key) {
    int hash = hash(key);     // throws null pointer exception if key is null

    // Try first without locking...
    Entry[] tab = table;
    int index = hash & (tab.length - 1);
    Entry first = tab[index];
    Entry e;

    for (e = first; e != null; e = e.next) {
      if (e.hash == hash && eq(key, e.key)) {
        Object value = e.value;
        // null values means that the element has been removed
        if (value != null) 
          return value;
        else
          break;
      }
    }

    // Recheck under synch if key apparently not there or interference
    Segment seg = segments[hash & SEGMENT_MASK];
    synchronized(seg) { 
      tab = table;
      index = hash & (tab.length - 1);
      Entry newFirst = tab[index];
      if (e != null || first != newFirst) {
        for (e = newFirst; e != null; e = e.next) {
          if (e.hash == hash && eq(key, e.key)) 
            return e.value;
        }
      }
      return null;
    }
  }

Removal operations

Because a thread could see stale values for the link pointers in a hash chain, simply removing an element from the chain would not be sufficient to ensure that other threads will not continue to see the removed value when performing a lookup. Instead, as Listing 3 illustrates, removal is a two-step process -- first the appropriate Entry object is found and its value field is set to null, and then the portion of the chain from the head to the removed element is cloned and joined to the remainder of the chain following the removed element. Since the value field is volatile, if another thread is traversing the stale chain looking for the removed element, it will see the null value field immediately, and know to retry the retrieval with synchronization. Eventually, the initial part of the hash chain ending in the removed element will be garbage collected.
Listing 3. ConcurrentHashMap.remove() implementation
  protected Object remove(Object key, Object value) {
    /*
      Find the entry, then 
        1. Set value field to null, to force get() to retry
        2. Rebuild the list without this entry.
           All entries following removed node can stay in list, but
           all preceding ones need to be cloned.  Traversals rely
           on this strategy to ensure that elements will not be
          repeated during iteration.
    */

    int hash = hash(key);
    Segment seg = segments[hash & SEGMENT_MASK];

    synchronized(seg) {
      Entry[] tab = table;
      int index = hash & (tab.length-1);
      Entry first = tab[index];
      Entry e = first;

      for (;;) {
        if (e == null)
          return null;
        if (e.hash == hash && eq(key, e.key)) 
          break;
        e = e.next;
      }

      Object oldValue = e.value;
      if (value != null && !value.equals(oldValue))
        return null;
     
      e.value = null;

      Entry head = e.next;
      for (Entry p = first; p != e; p = p.next) 
        head = new Entry(p.hash, p.key, p.value, head);
      tab[index] = head;
      seg.count--;
      return oldValue;
    }
  }
Figure 1 illustrates a hash chain before an element is removed:
Figure 1. Hash chain
Figure 1. Hash chain
Figure 2 illustrates the chain with element 3 removed:
Figure 2. Removal of an element
Figure 2. Removal of an element

Insertion and update operations

The implementation of put() is straightforward. Like remove()put() holds the bucket lock for the duration of its execution, but because get() doesn't always need to acquire the lock, this doesn't necessarily block readers from executing (nor does it block other writers from accessing other buckets). First, put() searches the appropriate hash chain for the desired key. If it is found, then the value field (which is volatile) is simply updated. If it is not found, a new Entry object is created to describe the new mapping and is inserted at the head of the list for that bucket.

Weakly consistent iterators

The semantics of iterators returned by ConcurrentHashMap differ from those of java.util collections; rather than being fail-fast (throwing an exception if the underlying collection is modified while an iterator is being used), they are instead weakly consistent. When a user calls keySet().iterator() to retrieve an iterator for the set of hash keys, the implementation briefly synchronizes to make sure the head pointers for each chain are current. The next() and hasNext() operations are defined in the obvious way, traversing each hash chain and then moving on to the next chain until all the chains have been traversed. Weakly consistent iterators may or may not reflect insertions made during iteration, but they will definitely reflect updates or removals for keys that have not yet been reached by the iterator, and will not return any value more than once. The iterators returned by ConcurrentHashMap will not throw ConcurrentModificationException.

Dynamic resizing

As the number of elements in the map grows, the hash chains will grow longer and retrieval time will increase. At some point, it makes sense to increase the number of buckets and rehash the values. In classes like Hashtable, this is easy because it is possible to hold an exclusive lock on the entire map. In ConcurrentHashMap, each time an entry is inserted, if the length of that chain exceeds some threshold, that chain is marked as needing to be resized. When enough chains vote that resizing is needed, ConcurrentHashMap will use recursion to acquire the lock on each bucket and rehash the elements from each bucket into a new, larger hash table. Most of the time, this will occur automatically and transparently to the caller.

No locking?

To say that successful get() operations proceed without locking is a bit of an overstatement, as the value field of Entry is volatile, and this is used to detect updates and removals. At the machine level, volatile and synchronization often end up getting translated into the same cache coherency primitives, so there effectively is some locking going on here, albeit at a finer granularity and without the scheduling or JVM overhead of acquiring and releasing monitors. But, semantics aside, the concurrency achieved by ConcurrentHashMap in many common situations, where retrievals outnumber insertions and removals, is quite impressive.

0 comments: