Volatile arrays

October 30, 2008

The volatile variable rule from the Java Memory Model, makes sure that there is a happens before relation between a write and read of a volatile variable. This makes it possible to exchange not just values of primitive types between threads, but also complete object graphs without adding the burden of safe publication to each objects in the graph. So volatiles variables are a nice tool in the concurrency toolbox.

But it is not possible to create an array of volatile elements, So if an array element is written by one thread and read by another, some happens before relation needs to be added. Declaring the array reference volatile has no effect, because a happens before relation only exists between a write and a read of a volatile variable, and not between 2 reads.

You could add a synchronized block to introduce the happens before relation (monitor lock rule):

final int[] array = ...;

void write(int index, int value){
	array[index]=value;
	synchronized(this){}//does the monitor.unlock
}

void read(int index){
	synchronized(this){}//does the monitor.lock
	return array[index];
}

But it isn’t the prettiest solution because it could lead to expensive operating system calls. Just doing a dummy write and read to a volatile variable, could also introduce the desired happens before relation:

final int[] array = ...;
volatile boolean barrier;

void write(int index, int value){
	array[index]=value;
	barrier = true;//the volatile write
}

int read(int index){
	boolean b = barrier;//the volatile read
	return array[index];
}

Volatile variables have less overhead than locks, but it still isn’t perfect because a happens before relation is created between a write and read of different elements. This unneeded happens before relation could influence the performance of the cache. So it would be better if only happens before relations are added where needed. To be honest, I don’t have got numbers to back it up and I also don’t have access to serious hardware (with a lot of cores) to create benchmarks. What could work well on my dual core Intel machine, could behave terrible on a serious multi-processor environment.

Another solution would be to create an array of volatile wrapper objects:

class VolatileInt{
   volatile int value;
}


final VolatileInt[] array = create(10);

VolatileInt[] create(int size){
	VolatileInt[] array = new VolatileInt[size];
	for(int k=0;k<size;k++)
		array[k]=new VolatileInt();
	return array;
}

void write(int index, int value){
	array[index].value = value;
}

int read(int index){
	return array[index].value;
}

It increases the size of the array in memory, and it adds some overhead because an extra field access needs to be done, but you have complete control on the number of happens before relations.

Luckily volatile arrays are not a problem people encounter frequently.

ps:
even though the introduced volatile write/read or synchronized(this){} add no visible value to the executing thread, it can’t be removed by the compiler because it does add a happens before relations.


JMM: single-processor can also suffer from visibility problems

October 27, 2008

From time to time I get the question if single processor (uni-processor) machines also suffer from visibility and reordering issues. Well, yes they can suffer from it. First of all, instruction reordering is not limited to multi-processor machines. Single-processor machines also benefit from optimizations based on instruction reordering, and single-processors can also use low level parallelization techniques like superscalar execution that lead to an out order execution of instructions.

But what about visibility issues? Since there only is a single-processor, writes made by one thread, even though stuck in cache, will be visible to another thread because the cache is shared between threads. So it sounds plausible that no visibility problems can happen. But there is more local memory in a cpu than only caches, namely registers. If a compiler decides to replace an expensive non volatile variable by a register (optimization called ‘register allocation’), this register will be swapped in and out when a thread context switch takes place. This means that one thread never has to see the changes made by another thread because the register content will always be thread specific. You could see the register as a thread local variable. So even on a single core machine, reordering and visibility problems still can occur.


STM and MVCC Considerations II

October 8, 2008

In my previous post I mentioned an isolation anomaly that can happen with MVCC systems (for example in Oracle). The cause of this issue is that a transaction in a MVCC system doesn’t care if the data it only reads, is updated by a different transaction. MVCC provides a stable view over the system by maintaining multiple versions of the same data, so it doesn’t need to worry about other transactions updating data it has read, because the transaction won’t see these updates. The only thing it needs to worry about it is detecting if another transaction has updated the same data the transaction wants to update. If such a write conflict is found, the transaction can’t commit. If better serialization behavior is needed, all data that was touched by the transactions (the reads and the writes) needs to be checked for conflicts.

If we look at the example again:

StmStack stack1 = new StmStack();
StmStack stack2 = new StmStack();

void foo(){
	transaction{
		stack1.push(stack2.size());	
	}
}

void bar(){
	transaction{
		stack2.push(stack1.size());	
	}
}

If transaction1 (calling foo) executes concurrently with transaction2 (calling bar) one of the the following 2 scenario’s happens (the commit is atomic):

  1. transaction1 commits before transaction2: transaction2 is aborted because the stack1 it has read, has been updated by transaction1
  2. transaction2 commits before transaction1: transaction1 is aborted because the stack2 it has read, has been updated by transaction2

Of course this improved isolation is not free because there is more data to check. And it can leads to more retries of transactions and this increases the chance of a livelock. But is nice to see that this problem can be solved.

With the STM implementation I’m currently working on, it is quite easy to realize. When an object is read in a transaction, and it has not been read before, the dehydrated version (stored in the STM) of that object is asked to provide a hydrated version. This hydrated version is logged inside the transaction, to make sure that the same hydrated version is returned every time. So since the transaction tracks all reads and writes, enhancing the conflict detection would be easy.


STM and MVCC considerations

October 5, 2008

I’m working on a Software Transaction Memory (STM) implementation for experimentation purposes. And it uses Multi Version Concurrency Control (MVCC) just like Oracle. The big advantage of MVCC is that it is highly concurrent because it doesn’t rely on locks to prevent other transactions from interfering when you only need to read data. When a transaction begins, it gets a snapshot of the world and this snapshot remains constant even if other transactions are making changes. Changes made by the transaction are visible to itself of course.

The problem however is that MVCC could give subtle isolation problems. Eg.

StmStack stack1 = new StmStack();
StmStack stack2 = new StmStack();

void foo(){
	transaction{
		stack1.push(stack2.size());	
	}
}

void bar(){
	transaction{
		stack2.push(stack1.size());	
	}
}

If foo en bar are called concurrently, you would expect one of the following states:

  1. stack1 contains 0 and stack2 contains 1 (if transaction1 is executed first)
  2. stack1 contains 1 and stack2 contains 0 (if transaction2 is executed first)

But with MVCC stack1 and stack2 both can contain the value 0! This is because the transactions don’t see the changes made by other the transaction, and since there is no write conflict (both transactions write to a different stack), both transactions can be committed. If this is an issue in practice.. I don’t know.. but it is something to keep in mind. I have placed a similar post about this issue in Oracle.


Breaking Oracle SERIALIZABLE

October 4, 2008

The SERIALIZABLE isolation level, is the highest isolation level for transactions. When transactions are executed under this isolation level, they should behave as they were being executed serially (one after another). However under Oracle, and all other databases that use Multi Version Concurrency Control (MVCC) like Postgresql, MySQL + InnoDb, it is possible to to break this behavior. At the moment I’m working on an STM implementation that also uses MVCC, so it suffers from the same problem, I’ll get back to this in another blogpost.

The problem

If we have 2 tables A (with column a) and table B (with column b). And two transactions, with SERIALIZABLE isolationlevel, execute the following statements concurrently:

transaction1 INSERT INTO B (SELECT COUNT(*) FROM A)
transaction2 INSERT INTO A (SELECT COUNT(*) FROM B)

What do you think the result is going to be? The SERIALIZABLE isolation level mandates a serial execution of transactions, so transaction1 is executed first, or transaction 2 is executed first. If Transaction 1 executes first, the result would be A=1, and B=0. If Transaction2 executes first, the result would be A=0 and B=1. With MVCC databases the result can also be A=0 and B=0. How is this possible?

The cause

Transactions with the SERIALIZABLE isolation level, use a snapshot of the world for the duration of the transaction (same goes for the READ_ONLY isolation level), so during the execution of the transaction it will not see changes made by other transactions (it will see its own changes of course). When the transaction wants to commit, the database checks if there are conflicts and if there are, the transaction is aborted and you get the feared ORA-08177: can’t serialize access for this transaction.

Time Transaction 1 Transaction 2
T1 start
T2 start
T3 INSERT INTO B (SELECT COUNT(*) FROM A)
T4 INSERT INTO A (SELECT COUNT(*) FROM B)
T5 commit
T6 commit

At T4, Transaction2 won’t see the record that has been inserted in B by Transaction1, so 0 is inserted in table A. When Transaction1 wants to commit at T5, it can be committed because there were no conflicting writes. When Transaction2 wants to commit at T5, it can commit because it also has no conflicting writes. So this is why the result is A=0 and B=0. In most cases it won’t be an issue, but it is interesting to know that MVCC is not perfect.

If you want to know more about this problem, I advice the excellent books of Thomas Kyte.


Blocking Stacks and Queue’s in an STM

October 1, 2008

I’m currently working on a Java based STM implementation with deferred writes, object granularity and it relies a lot on multiversioning (just like Oracle) and less on pessimistic locking. This weekend I completed the STM version of condition variables and I’m glad I got my Stack and Queue up and running inside a multithreaded producer consumer example. Writing ‘concurrent’ code for SMT’s is a lot different than the traditional approach with locks/conditions and the like.

This is an example of the Stack that can be used safely inside an STM.

public class Stack<E> {

    private Node<E> head;

    public void push(E item) {
        if (item == null) throw new NullPointerException();
        head = new Node(item, head);
    }

    public E peek() {
        if (head == null)
            return null;

        return removeTopItem();
    }

    public E pop() {
        if (head == null)
            retry();

        return removeTopItem();
    }

    private E removeTopItem() {
        Node<E> oldHead = head;
        head = head.parent;
        return oldHead.value;
    }

    public int size() {
        return head == null ? 0 : head.size();
    }

    public boolean isEmpty() {
        return size() == 0;
    }

    public static class Node<E> {
        final E value;
        final Node parent;

        Node(E value, Node prev) {
            this.value = value;
            this.parent = prev;
        }

        int size() {
            if (parent == null)
                return 1;
            else
                return parent.size() + 1;
        }
    }

As you can see there is no concurrency logic here apart from the retry functionality (that works like a condition variable). At the moment I need to instrument the classes manually so they can be part of the STM, and after the instrumentation it looks like this:

public class Stack<E> implements Citizen {

    private Node<E> head;

    public void push(E item) {
        if (item == null) throw new NullPointerException();
        head = new Node(item, head);
    }

    public E peek() {
        if (head == null)
            return null;

        return removeTopItem();
    }

    public E pop() {
        if (head == null)
            retry();

        return removeTopItem();
    }

    private E removeTopItem() {
        Node<E> oldHead = head;
        head = head.parent;
        return oldHead.value;
    }

    public int size() {
        return head == null ? 0 : head.size();
    }

    public boolean isEmpty() {
        return size() == 0;
    }

    public static class Node<E> {
        final E value;
        final Node parent;

        Node(E value, Node prev) {
            this.value = value;
            this.parent = prev;
        }

        int size() {
            if (parent == null)
                return 1;
            else
                return parent.size() + 1;
        }
    }

    //================== generated  ======================

    private Node head_initial;
    private long ptr;
    private MultiversionedStm.MultiversionedTransaction transaction;

    public Iterator<Citizen> ___findNewlyborns() {
        return EmptyIterator.INSTANCE;
    }

    public void ___onAttach(MultiversionedStm.MultiversionedTransaction transaction) {
        this.transaction = transaction;
    }

    public MultiversionedStm.MultiversionedTransaction ___getTransaction() {
        return transaction;
    }

    public long ___getPointer() {
        return ptr;
    }

    public void ___setPointer(long ptr) {
        this.ptr = ptr;
    }

    public HydratedStack ___hydrate() {
        return new HydratedStack(head);
    }

    public boolean ___isDirty() {
        return head != head_initial;
    }

    public static class HydratedStack implements HydratedCitizen {
        private final Node head;

        public HydratedStack(Node head) {
            this.head = head;
        }

        public Stack dehydrate(long ptr, MultiversionedStm.MultiversionedTransaction transaction) {
            Stack stack = new Stack();
            stack.head = head;
            stack.head_initial = head;
            stack.transaction = transaction;
            stack.ptr = ptr;
            return stack;
        }
    }
}

And based on this stack it was easy to create a Queue that is quite concurrent because puts and takes can be executed concurrently in most situations (as long as the poppes see a non empty readyToPopStack).

public class Queue<E> {

    private Stack<E> readyToPopStack = new Stack<E>();
    private Stack<E> pushedStack = new Stack<E>();

    public E pop() {
        if (!readyToPopStack.isEmpty())
            return readyToPopStack.pop();

        while (!pushedStack.isEmpty()) {
            E item = pushedStack.pop();
            readyToPopStack.push(item);
        }

        return readyToPopStack.pop();
    }

    public void push(E value) {
        pushedStack.push(value);
    }

    public boolean isEmpty() {
        return size() == 0;
    }

    public int size() {
        return readyToPopStack.size() + pushedStack.size();
    } 

As you can see, with STM’s there is a lot less worrying about threadsafety. And we can focus on the core logic again. I think that STM’s could mean for Enterprise Java what OR-mappers meant for Java many years ago.