Synchronized as volatile replacement: part I

November 3, 2008

I was looking at the source code of Quartz (the SimpleThreadpool to be specific) and I noticed that synchronized blocks are used to access a single variable (read/write) instead of declaring the variable volatile.

//write: runnable is a object field.
synchronized(this) {
	runnable = null;
}

//read: run is an object field, shouldRun is a local variable.              
synchronized(this) {
	shouldRun = run;
}

The main purpose of these synchronized blocks, next to preventing reordering problems, is the prevent visibility problems: each reading thread that uses some lock, will see the most recently written value, written under the same lock. If the synchronized blocks are missing, one thread doesn’t need to see the changes made by another thread, because values can be stuck in the cache of a cpu for example. So synchronized blocks are a legit way to let threads exchange information.

But this can just as easily be achieved by declaring the variable volatile because the lock.release/lock.acquire (monitor lock rule) has the same JMM properties as a volatile.write and a volatile.read (the volatile variable rule). So if there are no invariants that can break by early publication of some state value, I prefer volatiles over synchronized blocks for publication for the following reasons:

  1. once declared volatile, each variable access will be protected. With synchronized locks it is easy to forget because you need to rely on a convention (that often isn’t documented or well understood).
  2. volatiles are lightweight, and won’t lead to Operating System calls unlike locks. Although JVM optimizations like biased locking and adaptive spin locking can reduce the actual number of Operating System calls. For more information about lock optimizations you could have a look at the following article on InfoQ article written by Jeroen Borders, a colleague.

In some cases you can even combine volatiles with locks: if the early publication of the variable doesn’t break invariants, but a change to that variable needs to be done atomic in combination with others.

In the next blogpost I will explain that synchronized blocks are a little bit more strict than volatiles variables.


Speaking at JFall about the JMM

November 3, 2008

At 12 November I’ll be speaking at JFall about the Java Memory Model. This is my first presentation for a large crowd, but this year I have been speaking at least once a month at my employer about mostly concurrency related subjects (low level concurrency, java memory model , java.util.concurrent, databases, architecture and concurrency, fork join framework, mvcc, stm). A few days later I’ll be giving a similar presentation at the IB-groep. Eventually my goal is to give presentations on large international (Java) conferences to become a concurrency authority, and this should help me to get the cool assignments.

So if you want to know more about the JMM, go to my presentation and don’t hesitate to ask questions.

[edit]
If you want to see my presentation (Dutch), you can check it here:
http://www.bachelor-ict.nl/peter-veentjer


Volatile arrays

October 30, 2008

The volatile variable rule from the Java Memory Model, makes sure that there is a happens before relation between a write and read of a volatile variable. This makes it possible to exchange not just values of primitive types between threads, but also complete object graphs without adding the burden of safe publication to each objects in the graph. So volatiles variables are a nice tool in the concurrency toolbox.

But it is not possible to create an array of volatile elements, So if an array element is written by one thread and read by another, some happens before relation needs to be added. Declaring the array reference volatile has no effect, because a happens before relation only exists between a write and a read of a volatile variable, and not between 2 reads.

You could add a synchronized block to introduce the happens before relation (monitor lock rule):

final int[] array = ...;

void write(int index, int value){
	array[index]=value;
	synchronized(this){}//does the monitor.unlock
}

void read(int index){
	synchronized(this){}//does the monitor.lock
	return array[index];
}

But it isn’t the prettiest solution because it could lead to expensive operating system calls. Just doing a dummy write and read to a volatile variable, could also introduce the desired happens before relation:

final int[] array = ...;
volatile boolean barrier;

void write(int index, int value){
	array[index]=value;
	barrier = true;//the volatile write
}

int read(int index){
	boolean b = barrier;//the volatile read
	return array[index];
}

Volatile variables have less overhead than locks, but it still isn’t perfect because a happens before relation is created between a write and read of different elements. This unneeded happens before relation could influence the performance of the cache. So it would be better if only happens before relations are added where needed. To be honest, I don’t have got numbers to back it up and I also don’t have access to serious hardware (with a lot of cores) to create benchmarks. What could work well on my dual core Intel machine, could behave terrible on a serious multi-processor environment.

Another solution would be to create an array of volatile wrapper objects:

class VolatileInt{
   volatile int value;
}


final VolatileInt[] array = create(10);

VolatileInt[] create(int size){
	VolatileInt[] array = new VolatileInt[size];
	for(int k=0;k<size;k++)
		array[k]=new VolatileInt();
	return array;
}

void write(int index, int value){
	array[index].value = value;
}

int read(int index){
	return array[index].value;
}

It increases the size of the array in memory, and it adds some overhead because an extra field access needs to be done, but you have complete control on the number of happens before relations.

Luckily volatile arrays are not a problem people encounter frequently.

ps:
even though the introduced volatile write/read or synchronized(this){} add no visible value to the executing thread, it can’t be removed by the compiler because it does add a happens before relations.


JMM: single-processor can also suffer from visibility problems

October 27, 2008

From time to time I get the question if single processor (uni-processor) machines also suffer from visibility and reordering issues. Well, yes they can suffer from it. First of all, instruction reordering is not limited to multi-processor machines. Single-processor machines also benefit from optimizations based on instruction reordering, and single-processors can also use low level parallelization techniques like superscalar execution that lead to an out order execution of instructions.

But what about visibility issues? Since there only is a single-processor, writes made by one thread, even though stuck in cache, will be visible to another thread because the cache is shared between threads. So it sounds plausible that no visibility problems can happen. But there is more local memory in a cpu than only caches, namely registers. If a compiler decides to replace an expensive non volatile variable by a register (optimization called ‘register allocation’), this register will be swapped in and out when a thread context switch takes place. This means that one thread never has to see the changes made by another thread because the register content will always be thread specific. You could see the register as a thread local variable. So even on a single core machine, reordering and visibility problems still can occur.


STM and MVCC Considerations II

October 8, 2008

In my previous post I mentioned an isolation anomaly that can happen with MVCC systems (for example in Oracle). The cause of this issue is that a transaction in a MVCC system doesn’t care if the data it only reads, is updated by a different transaction. MVCC provides a stable view over the system by maintaining multiple versions of the same data, so it doesn’t need to worry about other transactions updating data it has read, because the transaction won’t see these updates. The only thing it needs to worry about it is detecting if another transaction has updated the same data the transaction wants to update. If such a write conflict is found, the transaction can’t commit. If better serialization behavior is needed, all data that was touched by the transactions (the reads and the writes) needs to be checked for conflicts.

If we look at the example again:

StmStack stack1 = new StmStack();
StmStack stack2 = new StmStack();

void foo(){
	transaction{
		stack1.push(stack2.size());	
	}
}

void bar(){
	transaction{
		stack2.push(stack1.size());	
	}
}

If transaction1 (calling foo) executes concurrently with transaction2 (calling bar) one of the the following 2 scenario’s happens (the commit is atomic):

  1. transaction1 commits before transaction2: transaction2 is aborted because the stack1 it has read, has been updated by transaction1
  2. transaction2 commits before transaction1: transaction1 is aborted because the stack2 it has read, has been updated by transaction2

Of course this improved isolation is not free because there is more data to check. And it can leads to more retries of transactions and this increases the chance of a livelock. But is nice to see that this problem can be solved.

With the STM implementation I’m currently working on, it is quite easy to realize. When an object is read in a transaction, and it has not been read before, the dehydrated version (stored in the STM) of that object is asked to provide a hydrated version. This hydrated version is logged inside the transaction, to make sure that the same hydrated version is returned every time. So since the transaction tracks all reads and writes, enhancing the conflict detection would be easy.


STM and MVCC considerations

October 5, 2008

I’m working on a Software Transaction Memory (STM) implementation for experimentation purposes. And it uses Multi Version Concurrency Control (MVCC) just like Oracle. The big advantage of MVCC is that it is highly concurrent because it doesn’t rely on locks to prevent other transactions from interfering when you only need to read data. When a transaction begins, it gets a snapshot of the world and this snapshot remains constant even if other transactions are making changes. Changes made by the transaction are visible to itself of course.

The problem however is that MVCC could give subtle isolation problems. Eg.

StmStack stack1 = new StmStack();
StmStack stack2 = new StmStack();

void foo(){
	transaction{
		stack1.push(stack2.size());	
	}
}

void bar(){
	transaction{
		stack2.push(stack1.size());	
	}
}

If foo en bar are called concurrently, you would expect one of the following states:

  1. stack1 contains 0 and stack2 contains 1 (if transaction1 is executed first)
  2. stack1 contains 1 and stack2 contains 0 (if transaction2 is executed first)

But with MVCC stack1 and stack2 both can contain the value 0! This is because the transactions don’t see the changes made by other the transaction, and since there is no write conflict (both transactions write to a different stack), both transactions can be committed. If this is an issue in practice.. I don’t know.. but it is something to keep in mind. I have placed a similar post about this issue in Oracle.


Breaking Oracle SERIALIZABLE

October 4, 2008

The SERIALIZABLE isolation level, is the highest isolation level for transactions. When transactions are executed under this isolation level, they should behave as they were being executed serially (one after another). However under Oracle, and all other databases that use Multi Version Concurrency Control (MVCC) like Postgresql, MySQL + InnoDb, it is possible to to break this behavior. At the moment I’m working on an STM implementation that also uses MVCC, so it suffers from the same problem, I’ll get back to this in another blogpost.

The problem

If we have 2 tables A (with column a) and table B (with column b). And two transactions, with SERIALIZABLE isolationlevel, execute the following statements concurrently:

transaction1 INSERT INTO B (SELECT COUNT(*) FROM A)
transaction2 INSERT INTO A (SELECT COUNT(*) FROM B)

What do you think the result is going to be? The SERIALIZABLE isolation level mandates a serial execution of transactions, so transaction1 is executed first, or transaction 2 is executed first. If Transaction 1 executes first, the result would be A=1, and B=0. If Transaction2 executes first, the result would be A=0 and B=1. With MVCC databases the result can also be A=0 and B=0. How is this possible?

The cause

Transactions with the SERIALIZABLE isolation level, use a snapshot of the world for the duration of the transaction (same goes for the READ_ONLY isolation level), so during the execution of the transaction it will not see changes made by other transactions (it will see its own changes of course). When the transaction wants to commit, the database checks if there are conflicts and if there are, the transaction is aborted and you get the feared ORA-08177: can’t serialize access for this transaction.

Time Transaction 1 Transaction 2
T1 start
T2 start
T3 INSERT INTO B (SELECT COUNT(*) FROM A)
T4 INSERT INTO A (SELECT COUNT(*) FROM B)
T5 commit
T6 commit

At T4, Transaction2 won’t see the record that has been inserted in B by Transaction1, so 0 is inserted in table A. When Transaction1 wants to commit at T5, it can be committed because there were no conflicting writes. When Transaction2 wants to commit at T5, it can commit because it also has no conflicting writes. So this is why the result is A=0 and B=0. In most cases it won’t be an issue, but it is interesting to know that MVCC is not perfect.

If you want to know more about this problem, I advice the excellent books of Thomas Kyte.


Blocking Stacks and Queue’s in an STM

October 1, 2008

I’m currently working on a Java based STM implementation with deferred writes, object granularity and it relies a lot on multiversioning (just like Oracle) and less on pessimistic locking. This weekend I completed the STM version of condition variables and I’m glad I got my Stack and Queue up and running inside a multithreaded producer consumer example. Writing ‘concurrent’ code for SMT’s is a lot different than the traditional approach with locks/conditions and the like.

This is an example of the Stack that can be used safely inside an STM.

public class Stack<E> {

    private Node<E> head;

    public void push(E item) {
        if (item == null) throw new NullPointerException();
        head = new Node(item, head);
    }

    public E peek() {
        if (head == null)
            return null;

        return removeTopItem();
    }

    public E pop() {
        if (head == null)
            retry();

        return removeTopItem();
    }

    private E removeTopItem() {
        Node<E> oldHead = head;
        head = head.parent;
        return oldHead.value;
    }

    public int size() {
        return head == null ? 0 : head.size();
    }

    public boolean isEmpty() {
        return size() == 0;
    }

    public static class Node<E> {
        final E value;
        final Node parent;

        Node(E value, Node prev) {
            this.value = value;
            this.parent = prev;
        }

        int size() {
            if (parent == null)
                return 1;
            else
                return parent.size() + 1;
        }
    }

As you can see there is no concurrency logic here apart from the retry functionality (that works like a condition variable). At the moment I need to instrument the classes manually so they can be part of the STM, and after the instrumentation it looks like this:

public class Stack<E> implements Citizen {

    private Node<E> head;

    public void push(E item) {
        if (item == null) throw new NullPointerException();
        head = new Node(item, head);
    }

    public E peek() {
        if (head == null)
            return null;

        return removeTopItem();
    }

    public E pop() {
        if (head == null)
            retry();

        return removeTopItem();
    }

    private E removeTopItem() {
        Node<E> oldHead = head;
        head = head.parent;
        return oldHead.value;
    }

    public int size() {
        return head == null ? 0 : head.size();
    }

    public boolean isEmpty() {
        return size() == 0;
    }

    public static class Node<E> {
        final E value;
        final Node parent;

        Node(E value, Node prev) {
            this.value = value;
            this.parent = prev;
        }

        int size() {
            if (parent == null)
                return 1;
            else
                return parent.size() + 1;
        }
    }

    //================== generated  ======================

    private Node head_initial;
    private long ptr;
    private MultiversionedStm.MultiversionedTransaction transaction;

    public Iterator<Citizen> ___findNewlyborns() {
        return EmptyIterator.INSTANCE;
    }

    public void ___onAttach(MultiversionedStm.MultiversionedTransaction transaction) {
        this.transaction = transaction;
    }

    public MultiversionedStm.MultiversionedTransaction ___getTransaction() {
        return transaction;
    }

    public long ___getPointer() {
        return ptr;
    }

    public void ___setPointer(long ptr) {
        this.ptr = ptr;
    }

    public HydratedStack ___hydrate() {
        return new HydratedStack(head);
    }

    public boolean ___isDirty() {
        return head != head_initial;
    }

    public static class HydratedStack implements HydratedCitizen {
        private final Node head;

        public HydratedStack(Node head) {
            this.head = head;
        }

        public Stack dehydrate(long ptr, MultiversionedStm.MultiversionedTransaction transaction) {
            Stack stack = new Stack();
            stack.head = head;
            stack.head_initial = head;
            stack.transaction = transaction;
            stack.ptr = ptr;
            return stack;
        }
    }
}

And based on this stack it was easy to create a Queue that is quite concurrent because puts and takes can be executed concurrently in most situations (as long as the poppes see a non empty readyToPopStack).

public class Queue<E> {

    private Stack<E> readyToPopStack = new Stack<E>();
    private Stack<E> pushedStack = new Stack<E>();

    public E pop() {
        if (!readyToPopStack.isEmpty())
            return readyToPopStack.pop();

        while (!pushedStack.isEmpty()) {
            E item = pushedStack.pop();
            readyToPopStack.push(item);
        }

        return readyToPopStack.pop();
    }

    public void push(E value) {
        pushedStack.push(value);
    }

    public boolean isEmpty() {
        return size() == 0;
    }

    public int size() {
        return readyToPopStack.size() + pushedStack.size();
    } 

As you can see, with STM’s there is a lot less worrying about threadsafety. And we can focus on the core logic again. I think that STM’s could mean for Enterprise Java what OR-mappers meant for Java many years ago.


STM’s and waitsets

September 21, 2008

At the moment I’m working on an experimental Software Transactional Memory (STM) implementation to gain better insight in this new but very promising way to deal with concurrency. One of the classic problems with waitsets, is that it is not possible to let a thread sleep on more than 1 waitset. If there are 2 blockingqueues for example (each instance has its own waitset) a taking thread can only block one of those blockingqueues. So when a taking-thread has blocked on one of the blockingqueues, it won’t wake up when an item is placed on the other.

With STM’s this issue can be solved. Each transaction logs which cells it has read from STM. And when a transaction can’t make progress, because the blockingqueues are empty, it signals that it should be retried as soon as an update on one of these addresses has happened.

The pseudo code for the transaction would be something like this:

if(queue1.isEmpty() && queue2.isEmpty())
	retry(); 

In this case the internals of queue1 queue2 have been read from STM, so as soon as there is a change in these internals (an item is placed for example), the transaction is retried. Isn’t that cool?


IOC without an IOC-container

August 12, 2008

Spring is my framework of choice if I need to write enterprise applications. But in some occasions using Spring (or more dressed down IOC containers) is not an option, because:

  1. it is too heavy/big. In some (rare) cases every byte counts
  2. you can’t introduce new dependencies.
  3. all kinds of political reasons

A practical example of such an strange application is a javaagent. I’m working on one for the ‘Concurrency Detector’ and javaagents typically are small. And introducing extra dependencies can introduce unforeseen side-effects, e.g. I had a problem with using AspectJ in my Javaagent in combination with AspectJ in a webapplication. The side effect was that my class transformations totally got bypassed. That is one of the main why I dropped AspectJ for the concurrency detector and decided to use ASM instead.

So what if you can’t use an IOC container, but do want to use IOC. Simple: just create one (perhaps more) class that glues all components together, just like in the Spring container. This class can know of all specific implementations, policies/strategies, initializations etc etc. All the stuff you normally do in a Spring container you can do in this class. The components of your system don’t need to be aware of the IOC-container they are running in, and that is how I like my OO design.