Why a lock should be a constant

March 24, 2008

Last week I read a post on the concurrency mailinglist about a piece of code from Apache Tomcat. The code was analyzed with Findbugs by Bill Pough (one of the inventors) and it is riddled with concurrency errors.

private InstanceListener[] listeners = new InstanceListener[0];

public void addInstanceListener(InstanceListener listener) {

        synchronized (listeners) {

            InstanceListener[] results = new InstanceListener[listeners.length + 1];

            for (int i = 0; i < listeners.length; i++) results[i] = listeners[i];

            results[listeners.length] = listener;

            listeners = results;

        }

}

Apart from a missing happens-before relation on the write and read of the listeners array causing visibility problems, and the same goes for the content of the array (piggy backing on synchronization won’t work), there are serious ‘old school’ concurrency problems as well.

Imagine what happens when the lock on the monitor of listeners array is obtained by one thread. When another thread wants to obtain the lock, it is blocked because the lock is not available. In the meantime the first thread continuous en replaced the listeners array by a new one (containing all previous InstanceListeners and the new one). When this thread completes, it releases the lock. The blocked thread obtains the lock on the monitor of the stale listeners array and starts working on the new listeners array without having a lock on it.

These problems could have been prevented by:

  1. using a constant Object reference for a lock
  2. using the CopyOnWriteArrayList (the CopyOnWriteArraySet maybe would be a better performing alternative if the order of the listeners doesn’t matter)

Do you want a sequential consistent Java?

March 19, 2008

A lot of developers still think that Java is sequential consistent, but what does it mean? When code is executed by multiple threads, only results of all possible interleavings following the order of instructions in the sourcecode, are allowed and the most recently written variable will be visible when a read is done.

Example:

class Foo{
	int a,b;

	void init(){
		a=1;
		b=1;
	}

	void print(){
		println(format("a=%s b=%s"),a,b);
	}
}

If init and print are called by different threads, according to sequential consistency you can expect the following output:

a=0 b=0
a=1 b=0
a=1 b=1

The problem is that ‘a=0 and b=1’ also is possible because Java is not sequential consistent. This strange behaviour can be attributed to certain compiler optimizations:

  1. reordering: a=1 and b=1 could be reordened (maybe the compiler thinks it improves the chance of a cache hit, or the cpu stalls the execution of a=1 and starts with the execution of b=1 or just executes them in parallel but b=1 instruction finishes before a=1)
  2. visibility: it could happen that the write of a=1 is later visible in the thread that calls print than b=1 (a visibility problem could also be seen as a reordering problem).

When I explain this to developers they start to look funny at me at best or they shout that I’m nuts and Java is broken. Well.. although sequential consistency is very intuitive, it also is very expensive because it prevents a lot of optimizations. So it would be nice experiment to see if we can make the JVM sequential consistent and compare the performance to a standard JVM.

I think this can be largely done by changing all normal reads/writes into volatile read/writes because volatile reads and writes prevent reordering and visibility problems. Changing them to volatile read/writes also doesn’t influence the behavior of the application (apart from the reduced performance). Modifying the normal read/writes would not be that difficult, it can be done writing a custom JVMTI agent that modifies the bytecode. If I have time, I’m going to give it a try this weekend. I’ll keep you posted.


Terracotta greedy locks and Java biased locking

March 18, 2008

With each release of Java the performance of the virtual machine increases. With the introduction of the new Java Memory Model based on JSR-133 it is completely clear which compiler optimizations are and are not allowed. But also a lot of improvements have been made to increase the performance of locking itself, to name a few:

  1. adaptive spinning: instead of calling expensive atomic operations, just try to spin a few times and maybe the lock was obtained. Based on prior experiences the JVM can adapt and change the number of tries
  2. lock coarsening: replacing multiple acquire/releases by a single one and therefor reducing the number of atomic operations
  3. lock elision through Escape Analysis: if an object is confined to a single thread, a lock has no additional value (even not for the JMM), so it can be removed (and also the atomic operations)

But there is another optimization technique that is very useful for uncontended locks (just like lock coarsening en lock elision): biased locking. Biased locking prevents executing expensive atomic calls by letting a lock being biased towards a specific thread. As long as that thread acquires (and releases) the lock, no atomic operations are needed. Only when another thread acquires the lock, an atomic call is made.

I was messaging with Alex Miller of Terracotta about how locking works in Terracotta. One of my worries was that the single active Terracotta server would be a contention point, because each and every lock acquire/release would need to communicate with the Terracotta server. Alex explained that Terracotta uses a special optimization technique called Greedy locks: a lock can be checked out on a specific node and as long as the node needs to acquire/release the lock no communication with the Terracotta server is needed and thereby reducing the amount of communication (and latency) required.

It is fun to see that a similar solution can be applied on different levels.


Quickstart project: Terracotta/Spring/Ant

March 14, 2008

Terracotta is a great way to created a distributed heap, but setting up a complete working example using Spring/ANT from scratch takes some time. So one of the things I do in such a situation is to create some kind of template project I can reuse so I’m up and running faster the next time. I decided to share this template with the Java community. The current template project contains a producer of messages (strings) and a consumer of messages. The messages are distrubuted by using a BlockingQueue that is distributed by Terracotta.

The only thing that needs to be changed are the settings in the tc.properties file.

You can download the project here: