Monday, July 23, 2007

Implementing a Copy-on-Write HashMap in Java

Under Unix, when a process is forked, the process that is forked is cloned (almost). This involves writing a large amount of data to create a new copy for the newly created process. This allows the parent and the child to have their own copies of the data they require to execute. After the fork occurs, either process can modify their copy of this data without affecting the other process.

To copy large amounts of data has an obvious impact on performance, so developers quickly realised that they could optimise this in the following manner:

1. When the fork occurs the data is not copied, instead it is marked as read only.
2. If either process attempts to modify the data, then, and only then, a copy is made for that process.

This approach works on the basis that most data will not be modified.

Java 1.5 introduced a couple of approaches for doing this with Collections: CopyOnWriteArrayList, CopyOnWriteArraySet. These work by creating a new copy of the underlying Collection only when data is modified.

I recently had the following requirement that was not met by this new class:

1. Each Thread is initialised with an identical HashMap containing configuration information.
2. If a Thread changes any of this information, the change is local to them.

This is an example where Copy-on-Write works well. If we assume that most Threads will not change most of the values in the HashMap, there is a large performance benefit associated with implementing a copy-on-write implementation.

In order to allow each Thread to have its own local values, the ThreadLocal class is invaluable. This allows each Thread to have an Object (in this case a HashMap) that is local to themselves, and cannot be seen by other threads.

My CopyOnWriteCache class below is initialised with the Global HashMap representing the default configuration, but is also given a ThreadLocal. Any put() operation updates the ThreadLocal HashMap rather than the global one.

Converseley, any get() method first looks in the ThreadLocal HashMap, and then looks in the global HashMap if no value is present.

By running the CopyOnWriteRunner class below you can see an example. The HashMap is initialised with a key called "key1" having a value of "value". Three threads are then created and started. Each of these prints out the value of this key, modifies the value of the key, and then prints the value again. Each thread will initially see the default value, even if it has already been changed by another thread.


++++++++++++++CopyOnWriteCache+++++++++++++++++


import java.util.HashMap;
import java.util.Map;

public class CopyOnWriteCache {

private Map theMap;

/**
* The ThreadLocal has a HashMap that can be used for storing local
* values.
*/
ThreadLocal> localMap =
new ThreadLocal>() {
protected Map initialValue() {
return new HashMap();
}
};

/**
* The object is initialised with a Map that contains default values.
* These can be overriden in a manner specific to this Object.
*/
CopyOnWriteCache(Map initialisedMap) {
theMap = initialisedMap;
}

/**
* get() first looks for a ThreadLocal value - if none is available
* it returns the global value.
*/
public V get(K theKey) {
if (localMap.get().containsKey(theKey)) {
return localMap.get().get(theKey);
}
return theMap.get(theKey);
}

/**
* Put does a copy-on-write, and stores the value in
* its ThreadLocal map.
*/
public void put(K theKey, V theValue) {
localMap.get().put(theKey, theValue);
}

}

++++++++++++++CopyOnWriteRunner+++++++++++++++++

import java.util.HashMap;
import java.util.Map;


public class CopyOnWriteRunner {

public static void main(String[] args) {
Map m = new HashMap();
m.put("Key1", "Value");
m.put("Key2", "Value");
CopyOnWriteCache copyOnWriteCache = new CopyOnWriteCache(m);
CacheRunner cr1 = new CacheRunner(copyOnWriteCache, "Key1", "New Value 1");
CacheRunner cr2 = new CacheRunner(copyOnWriteCache, "Key1", "New Value 2");
CacheRunner cr3 = new CacheRunner(copyOnWriteCache, "Key1", "New Value 3");
Thread t1 = new Thread(cr1);
Thread t2 = new Thread(cr2);
Thread t3 = new Thread(cr3);
t1.start();
t2.start();
t3.start();
}

private static class CacheRunner implements Runnable {

CopyOnWriteCache cache;
String key;
String value;

public CacheRunner(CopyOnWriteCache theCache, String theKey, String theValue) {
cache = theCache;
key = theKey;
value = theValue;
}

public void run() {
System.out.println("Initial Value = " + cache.get(key));
cache.put(key, value);
System.out.println("New Value = " + cache.get(key));
}

}

}

No comments: