Can your threads wait?

January 31, 2007

Introduction

After quite some time, I have decided to pick up my concurrency library and I hope to make a first release in a month. The library extends JSR-166 (the excellent concurrency library that is part of Java 5 and higher) and contains some goodies I have used in various server-side projects, like:

  1. Repeater: a structure that keeps repeating a task over and over again. The standard implementation is the ThreadPoolRepeater (it uses a pool of threads to run the task concurrently). Repeaters are great for setting up processes that need to block while waiting for input/output, like batch processes.
  2. BlockingExecutor: an Executor with more control on the blocking and timeout behavior.
  3. AwaitableReference: a synchronization structure that makes it easy to wait for a (non null) reference.
  4. LendableReference: a synchronization structure that looks a lot like the AwaitableReference, but the value needs to be returned before a new one can be set (although this depends on the implementation).
  5. a configurable ThreadFactory implementation. The JSR-166, contains a ThreadFactory interface, but not a configurable implementation. This is not very handy if need a lot of control on threads (especially on the server-side), eg: you want to run batch threads on a much lower priority than interactive threads.

The library will be open sourced and released under the MIT/BSD license.

WaitPoint

One of the things that annoyed me, while implementing this library, is that I had to reimplement waiting logic: making the ThreadPoolRepeater and ThreadPoolBlockingExecutor pausible for example. Finally it occurred to me, I needed a structure, threads can wait on: the WaitPoint. The WaitPoint is nothing more than a synchronization structure a Thread needs to pass to continue:

example:

while(true){
	waitpoint.pass();
	System.out.println("hello");
}

As long as the waitpoint allows to pass, hello is printed. As soon as it doesn’t allow passage, the thread blocks and nothing is printed. In essence the WaitPoint is just the waiting part of a Condition.

CloseableWaitPoint

The CloseableWaitPoint is WaitPoint implementation that can be openen and closed. If it is open, all pass request won’t block and the threads can continue what they were doing. But if it is closed, all threads that want to pass, block as long as it is closed. When it is opened again, the threads can continue.

But the CloseableWaitpoint still is a low level concurrency structure. A more high level structure is the BlockingQueue: excellent for sharing data between threads. I also made a new BlockingQueue implementation (it is a decorator): one where all puts go through a front-waitpoint and all takes go through a back-waitpoint (the front of the queue is where the puts take place, and the back of the queue is where the takes take place). Using these two waitpoints, you can control threads that want to put messages in the queue, or want to take them from the queue.

Pausible Executors

The cool thing is that this technique can be used to make a ThreadPoolExecutor pausible:

BlockingQueue<Runnable> targetQueue = new LinkedBlockingQueue<Runnable>(10);
CloseableWaitPoint frontCloseableWaitPoint = new CloseableWaitPoint();
CloseableWaitPoint backCloseableWaitPoint = new CloseableWaitPoint();
BlockingQueue<Runnable> workQueue = new
 	BlockingQueueWithWaitingTakesAndPuts<Runnable>(
 		targetQueue,
 		frontCloseableWaitPoint,
 		backCloseableWaitPoint);
ThreadPoolExecutor executor = new ThreadPoolExecutor(workQueue);

If you want to stop the acceptance of new tasks, do:

frontCloseableWaitPoint().close();

This gives the executor chance to execute all outstanding tasks, but new tasks aren’t accepted.

If you want to stop execution of tasks, you can close the back-end of the queue:

backCloseableWaitPoint().close();

If you leave the frontCloseableWaitPoint open, you still accept new tasks, but they won’t be processed as long the backCloseableWaitPoint is closed. I have used a similar approach with a LendeableReference and the ThreadPoolRepeater to make this structure pausible.

Other usages

By creating different WaitPoints you can customize waiting behaviour to a high degree without having to integrate it into the structure (Inversion of Control rocks). Because the waiting functionality is extracted into a seperate object, this object can be shared between a lot of structures: you can control a lot of structures with one WaitPoint. Another usage is throttling: you could create a ThrottlingWaitpoint to control the period between passes for example. By setting the minimum delay to 10 miliseconds, at most 100 passes per second are allowed. By using this waypoint, you can control the ‘speed’ of task execution of the Repeater for example, but I guess it can be used for a lot of things.

ps:
I’m sitting in the bus to my work, and another usage came to mind: it also can be used to regulate the capacity of structures that contain elements of some sort. There are already bounded implementations of the BlockingQueue, but moving this behavior in a separate structure, makes it reusable and reduces the complexity.


Oracle and locking thoughts

January 28, 2007

Introduction

I was talking with a colleague about a locking issue in Oracle. We have the following constraint:

An administrator (an user with the is_admin flag set to ‘T’) is allowed to change the is_admin flag on users. To prevent problems, like locking out all administrators, an administrator is not allowed to unset his own is_admin flag. This way you will always have at least one administrator.

But if there are concurrent administrators, a naive implementation could lead to data races:

user 1 (an administrator) sets the is_admin flag to 'F' of user 2.
user 2 (an administrator) sets the is_admin flag to 'F' of user 1.

That is why some concurrency control is needed. We were discussing different ways of solving this problem:

  1. using READ_COMMITTED isolation level
  2. using READ_COMMITTED isolation level and pessimistic locking
  3. using SERIALIZED isolation level

There are more alternatives, but for argument sake I will only discuss these.

Short introduction to MVCC

We are using Oracle and it uses a special concurrency technology: Multi Version Concurrency Control (MVCC). I won’t go into the details of this technology (there is enough good material written about this subject), but I’m going to explain how it can be used as part of the solution. MVCC uses a System Change or Commit Number (SCN): every time a commit is made, this number is increased. The cool thing about MVCC is, is that it is possible to reconstruct records as they were at some point in time. This point in time, can be identified with the SCN. This means that a transaction can have a consistent view over data, whatever other transactions are doing. Oracle support 2 levels of read consistent views:

  1. statement level read consistency: when a statement is executed, the SCN is stored and used for the duration of the execution of that statement. This means that during the execution of that statement, it won’t see changes made by other transactions. That is why the READ_UNCOMMITTED isolation level isn’t supported by MVCC: it doesn’t need to, because it always can track back to a committed version of a record, and it doesn’t need to read uncommitted data. When the next statement is executed, the newest SCN will be selected. This means that during the execution of a single transaction, multiple SCN’s could be used.
  2. transaction level read consistency: when a transaction begins, the SCN is stored and used for the duration of that transaction. This means that a transaction is not able to see changes made by other transactions. This means that transaction level read consistency, prevents unrepeatable reads and phantom reads and that is why it is used to implement the SERIALIZED isolation level. The REPEATABLE_READ isolation level isn’t needed because MVCC provides a consistent view for the duration of the transaction. Other inserts/deletes (phantom reads) and updates (unrepeatable reads) are not visible with transaction level read consistency.

Using READ_COMMITTED

Let us bring this theory in practice with the READ_COMMITTED isolation level and no locking on the selected records. It is very important to realize that records that are updated, are automatically locked (isolation level doesn’t matter).

This is the initial database:

user(id=1, is_admin='T')
user(id=2, is_admin='T')
SCN = 0

Let us follow the following pseudo code steps:

Time Transaction 1 executes Transaction 2 executes
t1 start transaction trans1  
t2   start transaction trans2
t3 trans1.CN = SCN
select * from user where id = 1
(user 1 is not locked)
 
t4   trans2.CN = SCN
select * from user where id = 2
(user 2 is not locked)
t5 trans1.CN = SCN
update user set is_admin = ‘F’ where id = 2
(user 2 is locked)
 
t6   trans2.CN = SCN
update user set is_admin = ‘F’ where id = 1
(user 1 is locked)
t7 commit transaction trans1SCN++ (is now 1)
release lock user 2
 
t8   commit transaction trans2
SCN++
release lock user 1

There is no reason for the Oracle database to complain, because no conflicting locks are held, and the CN’s all match. So the transactions are allowed to commit, and this will be the database:

user(id=1, is_admin='F')
user(id=2, is_admin='F')
SCN=2

As you can see, there are no administrators anymore. Using READ_COMMITTED isolation level without locking on the selects, doesn’t solve our problem.

Using READ_COMMITTED and pessimistic locking

One solution to prevent this race problem, it to use the READ_COMMITTED isolation level in combination with pessimistic locking. Oracle support pessimistic locking with the ‘select for update’ statement.

Let is follow the same example, but now with select for update:

Time Transaction 1 executes Transaction 2 executes
t1 start transaction trans1  
t2   start transaction trans2
t3 trans1.CN = SCN
select * from user where id=1 for update
(user 1 is locked by trans1)
 
t4   trans2.CN = SCN
select * from user where id=2 for update
(user 2 is locked by trans2)
t5 trans1.CN = SCN
update user set is_admin = ‘F’ where id=2
rollback: (because trans2 already holds the lock on user 2,trans1 is rolled back)
release lock user 1
 
t6   trans2.CN = SCN
update user set is_admin = ‘F’ where id = 1
(user 1 & 2 are locked by trans2)
t7   commit trans2SCN++
release lock user 1 & 2

The result is that trans1 is rolled back, and trans2 is committed. The database will be:

user(id=1, is_admin='T')
user(id=2, is_admin='F')
SCN=1

The result is that one of the users will remain administrator. Using the select for update, and the update, provides a critical section around user 1 and 2: the check and update are now atomic and this prevent the race problem to occur. Personally I think the ‘select for update’ is a littlebit misguiding because you are not always to going to update the record(s) you select.

SERIALIZED Isolation Level

A different alternative to using pessimistic locking, is using optimistic locking. You could add optimistic locking yourself (by adding some version field), but the cool thing is that the SERIALIZED isolation level, under MVCC, also provides this functionality. It provides even more, because the SERIALIZED isolation level also prevents phantom reads. It is important to realize that preventing phantom reads (and throwing an optimistic locking failure) is not always desired behavior.

Time Transaction 1 executes Transaction 2 executes
t1 start transaction trans1
trans1.CN = SCN
 
t2   start transaction trans2trans2.CN = SCN
t3 select * from user where id = 1  
t4   select * from user where id = 2
t5 update user set is_admin = ‘F’ where id = 2  
t6   update user set is_admin = ‘F’ where id = 1
t7 assert trans1.CN = SCN
commit trans1inc SCN (SCN is now 1)
 
t7   assert trans.CN = SCN
(because 0!=1, the assert fails)
rollback

Transaction 2 now gets a ‘ORA-08177: can’t serialize access for this transaction’, and the database will be:

user(id=1, is_admin='T')
user(id=2, is_admin='F')
SCN=1

So using the SERIALIZED isolation level also prevents the race problem to occur.

Conclusion

My personal preference would be the second solution, because it locks what needs to be locked (and not more, so no unnecessary errors).