STM | Blog of Peter Veentjer

Akka and the Java Memory Model

July 7, 2011

There has been a lot of change in my life (professional and personal). I started working for Typesafe, the company behind Scala and Akka, where I can work-with/talk-to people like Martin Odersky, Jonas Boner, James Gosling and last but certainly not least, my personal hero: Doug Lea. And I moved to a different country (Bulgaria), the weather here is just great and my apartment looks out over the Black Sea. And I’m busy getting my body back in shape.

This is my first blogpost on the company blog, but I also wanted to have a link from this blog. So have a nice read!

Akka and the Java Memory Model.

1 Comment | Akka, Concurrency, Scala, STM | Permalink
Posted by pveentjer

New stuff for Multiverse 0.7 STM

August 12, 2010

The last few months I have been working very hard on the 0.7 release of Multiverse. And it is going to gain a lot of new goodes:

new engine: a new STM engine has been added that is better scalable and better performing than the current TL2 (MVCC) based AlphaSTM. There are 2 reasons for this: much more object pooling (I pool almost everything) and removal of the central commit counter that needs to be increased when an updating transaction commits. The new BetaSTM engine is based on the SkySTM although I have taken a different approach for easing on concurrent arrives/departs.
It will also be the the basis for making a distributed stm possible since the central commit counter will not scale (and a conflict counter used in the new BetaSTM can be striped). The new engine also relies heavily on templating (using Velocity) to generate multple transaction implementations and methods optimized for certain behavior or types of references. This is done when the Multiverse jars are build and there will no runtime dependency on Velocity.
propagation levels: just like with databases you can specify requires, requiresnew, supported etc so you have fine grained control on the nesting of transactions.
pessimistic behavior: which makes the stm more like traditional lock based concurrency control without the standard problems (race problems, deadlocks) and where operations still benefit from the standard stm advantages (composability of transactions, failure atomicity, isolation). It can be configured on transaction level
for all reads or for all writes. But it also can be done on transactional object (reference) level for fine grained control
rollback on: so you can specify when a transaction should abort/commit when an exception happens. Atm a transaction will always be aborted and with this behavior your have more control.
commuting behavior: to make higher concurrency possible by allowing certain interleavings of transactions. This was already available in the 0.5 release, but user supplied functions were not possible and in the 0.7 release will be.
Clojure semantics to make Multiverse usable in any Java project without relying on Clojure:
1. commute. One of the differences of the Clojure approach is that the commuting function will not be reevaluated at the end if somehow a dependency was created on the value. This means that the resulting state always will be correct.
2. add-watch: to make it possible to listen to change on a single ref. This lifts on the same
  mechanism as the retry, but the big difference is that you can listen on a single ref, instead of all reads.
3. ensure: the prevent writeskew problems. One of the differences compared to the clojure approach is that you also can block on ref that has been ‘ensured’ not to change. This behavior lifts on the pessimistic support. Writeskew prevention in Multiverse already is available on the transaction level btw.
4. alter: to apply a function on a reference. Not very interesting itself. The big difference with Clojure is that there will not be any boxing.

The properties also are going to be exposed in Akka; so you can expect a lot of new functionality there. The focus for the 0.7 release will mainly be to provide a half fabricate that can be used in other Java projects without relying on instrumentation. This will probably not be completely up to date.

I’m also working on an experimental storage engine to make durable transactions possible. But I don’t think it will be complete for the 0.7 release.

4 Comments | Concurrency, STM | Permalink
Posted by pveentjer

STM: Read tracking vs visibile reads.

July 25, 2010

It appears there is some confusion about these terms for STM design and I want to clarify them.

Read tracking

If a transaction tracks (so stores them in a local buffer) all reads that have been done, this is called read tracking. Some of the advantages of readtracking are:

make certain forms of isolation possible: like causal consistency or protection against a write skew. It depends on the STM algorithm and settings what kind of support is needed. If the MVCC/TL2 algorithm is used, readtracking is needed less because the design always provides a read consistent view. Other algorithms like the one used in the SkySTM rely more on read tracking because the readset needs to be revalidated when the conflict counter changes (SkySTM specific). The big advantage of the SkySTM above the TL2 design is that there is less need for a central counter.
provide certain blocking operations: like the ‘retry’ and the ‘orelse’. Based on all reads that have been done, a transaction can register some form of listener and blocks. Once a write on one of the reads has happened, the listening transactions is notified and can retry the execution of the transaction.
reduce the number of read conflicts; once something is read and tracked (so stored in the buffer of the transaction), it doesn’t matter if another transaction does an update on that read, because the transaction can repeat the read from its local buffer. Especially for longer running transactions, or transaction that contend for certain datastructures, this could be an improvement.
improved performance: doing a read can be expensive since some synchronization actions are needed. Once it is stored in the local buffer, these expensive operations are not needed anymore. If a read happens multiple times, it can be an advantage to track it, so the expensive operation only happens once.

The disadvantage of read tracking is that it will consume more memory and processing time. This problem is magnified when field granularity is used instead of object granularity since each fields needs to be tracked individually.

Visible reads

For certain STM algorithms there is a need for a transaction to be notified when a write has been done on something the transaction has read. This mechanism of a transaction registering to addresses it has read, and gets notified when a change happens on the read, is called a visible read. For the TL2 design this isn’t needed, since the global clock will make sure that a transaction always gets a read consistent view. For other STM implementations like the SkySTM having some form of visible read is needed, to make sure that transactions get notified when a write one something they have read have happened.

The disadvantage of ‘plain’ visible reads, is that some registration needs to be done one the reads and this can cause a lot of contention. If you have a tree for example, a lot of transactions will contend for the root node, even though it doesn’t change very often. The SkySTM works with semi visible reads, where a transaction only needs to know if there are any transactions that have not committed, but have executed that read. So instead of registering transactions, each read has a counter. Once a transaction does a read, it increases the counter and once it commits/aborts, it decreases the counter. If another transaction wants to do an update on a read with a surplus of readers, it knows that it could be causing a conflict and increases the global conflict counter. Other transaction will revalidate their readset if the conflict counter has changed when they do another read. The SkySTM even improves the design of the counter by using a scalable non zero indicator to reduce contention on the counter even more. In the new design of the Multiverse BetaSTM (will be part of Multiverse 0.7) I have taken a different approach (I allow a transactional object/ref to become read biased once a certain number of reads (and no updates) have been executed. Once it become read biased, it doesn’t track readers anymore and will always cause the global conflict counter to be increased when an update happens.

3 Comments | Concurrency, STM | Permalink
Posted by pveentjer

STM and pessimistic behavior

July 23, 2010

Normally STM’s are very optimistic, conflicting modifying transactions are able to execute concurrently but only one of them is able commit in the end. In most cases this is desired behavior, but in some cases it can be quite annoying (e.g. when doing IO) and you want to prevent that a transaction is going to conflict and a lot of work is lost.

In Multiverse 0.7 multiple levels of pessimistic behavior are going to be added:

automatic lock all writes. This means that if you want update a transactional object, the locks on that object is either acquired or fails immediately. This looks a lot like the behavior MVCC databases like Oracle/Postgresql provide, although with these databases it still is possible to have concurrent reads and writes. With the new beta stm in Multiverse, this will be much harder to realize because the central clock, where MVCC/TL2 based STM rely on, has been removed because it limits scalability
automatic lock all reads (so also all writes because they need to be read before being written).
lock an individual object on reading (like the oracle select for update) or on writing for fine grained control

These properties will be accessible by setting the correct properties on the transaction or by doing a specific call to the API. So in essence one is now able to program using locks, but without the traditional problems:

Locks are still composable
race problems are not possible because locks are released to soon. Locks will not be released until the end of the transaction.
race problems are not possible because locks are acquired too late. The transaction just fails with a conflict when such a transaction wants to commit
locks can’t be forgotten to be released: locks will still be managed by the transaction and will automatically be released when the transaction aborts/commits
it is not possible to run into deadlocks since a transaction can’t wait indefinitely on acquiring a lock

Atm there will only be support for exclusive locks, but read/write locks are probably going to be added in the 0.8 release. Although I’m still thinking about a good mechanism to upgrade shared locks to exclusive locks.

Leave a Comment » | Concurrency, STM | Permalink
Posted by pveentjer

STM: Encryption and Security

June 28, 2010

Software Transactional Memory makes certain problems easier to solve since there is a additional layer between the data itself and using the data. The standard problems STM’s solve is to coordinate concurrency, provide consistency and failure atomicity. But adding more ‘enterprisy’ logic isn’t that hard.

One of the things that would be quite easy to add is security; in some cases you want to control if someone is allowed to access a certain piece of information (either for reading or for writing). Most stm’s already provide readonly vs update transactions, and once executed in readonly mode a user is now allowed to modify information (although one should be careful that a readonly transaction is not ignored because there is en enclosing update transaction). This can easily be extended to provide some form of security where certain information can only be accessed if certain conditions are met.

Another related piece of functionality is adding encryption; in some cases you don’t want the data to be available unprotected in memory (someone could access it with a debugger for example) or placed on some durable storage or being distributed. Adding some form of encryption when committing and decryption when loading also would not be that hard to realize, since the transaction already exactly knows what data is used. Personally I don’t like to work with encryption since it very often is completely in your face and gets intertwined with the design, but using some declarative annotations, would remove a lot of pain.

Leave a Comment » | Concurrency, STM | Permalink
Posted by pveentjer

STM: Object vs Field Granularity

June 28, 2010

Stm’s for object oriented languages like Java, c#, Scala etc can either be:

field granular: meaning that each field of an object is managed individually
object granular: meaning that all fields of an object are managed indivisible

Both approaches have some pro’s and con’s. The advantage of field granularity, is that independent fields of transactional object won’t cause conflicts. If you have a double linkedlist for example, you want the head and tail of the list to be accessed no matter what happens on the other side. This reduces the number of conflicts you get, so improves concurrency.

But having field granularity is not without its problems:

more expensive transactions: because each field needs to be tracked individually
more expensive loading; with a field granularity, each fields needs to be loaded individually.
more expensive conflict detection; since each fields needs to be checked individually
more expensive commit: each field needs to be locked/updated individually

This means that transactions require more cpu and memory. Another problem is that a transaction that uses field granularity takes longer and this means that it could suffer from more conflicts (causing reduced concurrency) and causing more conflicts (since they acquire locks longer). And last but not least, a field granular transaction puts a lot more pressure on your memory bus since more ‘synchronization’ actions are needed.

In Multiverse (when instrumentation is used), object granularity is the default. And to make field granularity possible, just place a @FieldGranular annotation on top of your field.

2 Comments | STM | Permalink
Posted by pveentjer

Java Extreme Performance: Part I – Baseline

June 24, 2010

I’m currently working on writing a new STM engine for the Multiverse project and this time the focus is going to be on scalability and performance. The goal is that for uncontended data and 1 ref per transaction, the update performance should be a least 10 million per second and readonly transactions should be 50/75 million per second (all per core and it should scale linearly). So I’m going to write some posts about what I discovered while improving performance and this is the first one.

The first big thing is that profilers become useless (even if you use stack sampling) because the call stack often changes so frequently, that no information can be derived from it (even if you use a sampling frequency of 1 ms which apparently is the lowest you can get).

What works best for me is to throw away everything you have and start from scratch with an (incomplete) baseline. This helps to create some kind of ‘wow.. I wish I could get this performance’ and by adding more logic, you can see how expensive something is. In a lot of cases you will be flabbergasted how expensive certain constructs are; e.g. a volatile read or a polymorphic method call, or that the JIT is not able to optimize something you would expect it to optimize.

This is a very time consuming process (especially since it also depends on the platform or the jdk you are using). But it will help to gain a deeper insight and help you to write better performing code.

2 Comments | Concurrency, Java, Performance, STM | Permalink
Posted by pveentjer

Groovy and STM

May 18, 2010

Yesterday evening I was playing with some initial support for Groovy in Multiverse; a software transactional memory implementation for Java. And with the help of Alex Tkachman is was easy to get up some initial support.

If you want to have a transactional reference (similar to a ref in Clojure), you can just create a Ref (or in this case a LongRef):

class Account{
   final LongRef balance = new LongRef();

   void long getBalance(){balance.get()}

   void setBalance(long newBalance){
       if(newBalance<0){ 
            throw new InsufficientFundsException();
       }
       this.balance.set(newBalance);
   }
}

And if you want to have an atomic block, you can execute the following:

    Account from = new Account(10)
    Account to = new Account(10)

    atomic(readonly:false, trackreads:true) {
      from.balance -= 5
      to.balance += 5
    }

    println "from $from.balance"
    println "to $to.balance"

The parameters in the atomic block are not needed (Multiverse is able to infer some settings and in the future more inference will be added), but I wanted to see if it was possible. Having closure support in a language, increases language complexity, but imho makes a language a lot easier to use and less painful on the eyes.

There is more clutter that can be removed, but I really need to polish my Groovy skills. Groovy support is going to be added to Multiverse 0.6 (expected in 6/8 weeks). I haven’t checked in the new groovy support on the snapshot, so contact me if you want to play with it.

2 Comments | Concurrency, Groovy, STM | Permalink
Posted by pveentjer

Multiverse: Timed blocking transactions

April 15, 2010

With traditional concurrency control you can do a timed wait on some resource (a lock for example) to come available. If the resources
doesn’t come available within a certain limit, the operation fails and this needs to be handled in the code. There are a few problems with
this approach:

no convenient way to pass the total timeout to each blocking call
you need to have several versions of blocking methods (with or without timeout, interruptible or not) and often there is no easy abstraction higher up. So you keep getting almost identical methods, that only differ in the type of blocking method they call
no easy way to roll back changes. If you decide to abort an operation because of a timeout, it could be hard to restore the system in the original state (so no atomicity)

With software transactional memory these problems can be solved. I have just added support for timed blocking methods, so you can say something like:

@TransactionalObject
class BankAccount{
    private int balance;

    public int balance(){
      return balance;
    }

    public void setBalance(int balance){
       this.balance = balance;
    }
}

  @TransactionalMethod(timeout = 1, timeoutTimeUnit = TimeUnit.SECONDS)
  public static void tryRemove(Account a, int amount){
      a.setBalance(a.getBalance()-amount);//is rolled back in case of a failure

       if(a.getBalance()<0{
              retry();
       }       
    }

When the tryRemove is called (and no transaction is running), and the balance is not sufficient, the call blocks until a timeout occurs or the balance is increased and the money withdrawn. And if you want the call to be interruptible, you just need to add the InterruptedException:

    @TransactionalMethod(timeout = 1)
    public void tryRemoveInterruptibly(int amount)throws InterruptedException{
       balance = balance - amount;//is rolled back in case of a failure

       if(balance<0{
              retry();
       }       
    }

And the timeunit defaults to SECONDS, so no need to configure that.

In Multiverse you only need to specify multiple blocking versions of a method if that method is exposed to non transactional methods. Transactional methods calling transactional methods, will always lift on the transaction of the outermost transactional call (Multiverse provides a flattened transaction model for nested transactions for now). This also means that it is very easy to create a timed blocking method on datastructures you don’t own.

Leave a Comment » | STM | Permalink
Posted by pveentjer

Multiverse STM: Looking for a Groovy Committer

April 10, 2010

One of the mission statements of Multiverse is to create a STM that can be used inside any JVM based language. We already have Scala integration available (although it needs some care) and since yesterday we also have someone responsible for integrating Multiverse with JRuby (welcome aboard Sai Venkat).

But it would be cool if we can add Groovy integration in the 0.5 release (in a week) so that it seamlessly integrates with Groovy. So we are looking for someone with Groovy experience, wants to experiment with STM technology and work on an Open Source project.

If you are interested, send a mail to alarmnummer at gmail dot com.

Leave a Comment » | Concurrency, STM | Permalink
Posted by pveentjer

	Peter Lawrey on val vs final
	fcr on Breaking Oracle SERIALIZABLE
	gilinachum on Are you dealing with the Rejec…
	pveentjer on Are you dealing with the Rejec…
	Gili Nachum on Are you dealing with the Rejec…

Blog of Peter Veentjer

Akka and the Java Memory Model

New stuff for Multiverse 0.7 STM

STM: Read tracking vs visibile reads.

Read tracking

Visible reads

STM and pessimistic behavior

STM: Encryption and Security

STM: Object vs Field Granularity

Java Extreme Performance: Part I – Baseline

Groovy and STM

Multiverse: Timed blocking transactions

Multiverse STM: Looking for a Groovy Committer

Categories

Archives

Recent Posts

Recent Comments

Top Clicks

Top Posts