July 7, 2011
There has been a lot of change in my life (professional and personal). I started working for Typesafe, the company behind Scala and Akka, where I can work-with/talk-to people like Martin Odersky, Jonas Boner, James Gosling and last but certainly not least, my personal hero: Doug Lea. And I moved to a different country (Bulgaria), the weather here is just great and my apartment looks out over the Black Sea. And I’m busy getting my body back in shape.
This is my first blogpost on the company blog, but I also wanted to have a link from this blog. So have a nice read!
Akka and the Java Memory Model.
August 12, 2010
The last few months I have been working very hard on the 0.7 release of Multiverse. And it is going to gain a lot of new goodes:
- new engine: a new STM engine has been added that is better scalable and better performing than the current TL2 (MVCC) based AlphaSTM. There are 2 reasons for this: much more object pooling (I pool almost everything) and removal of the central commit counter that needs to be increased when an updating transaction commits. The new BetaSTM engine is based on the SkySTM although I have taken a different approach for easing on concurrent arrives/departs.
It will also be the the basis for making a distributed stm possible since the central commit counter will not scale (and a conflict counter used in the new BetaSTM can be striped). The new engine also relies heavily on templating (using Velocity) to generate multple transaction implementations and methods optimized for certain behavior or types of references. This is done when the Multiverse jars are build and there will no runtime dependency on Velocity.
- propagation levels: just like with databases you can specify requires, requiresnew, supported etc so you have fine grained control on the nesting of transactions.
- pessimistic behavior: which makes the stm more like traditional lock based concurrency control without the standard problems (race problems, deadlocks) and where operations still benefit from the standard stm advantages (composability of transactions, failure atomicity, isolation). It can be configured on transaction level
for all reads or for all writes. But it also can be done on transactional object (reference) level for fine grained control
- rollback on: so you can specify when a transaction should abort/commit when an exception happens. Atm a transaction will always be aborted and with this behavior your have more control.
- commuting behavior: to make higher concurrency possible by allowing certain interleavings of transactions. This was already available in the 0.5 release, but user supplied functions were not possible and in the 0.7 release will be.
- Clojure semantics to make Multiverse usable in any Java project without relying on Clojure:
- commute. One of the differences of the Clojure approach is that the commuting function will not be reevaluated at the end if somehow a dependency was created on the value. This means that the resulting state always will be correct.
- add-watch: to make it possible to listen to change on a single ref. This lifts on the same
mechanism as the retry, but the big difference is that you can listen on a single ref, instead of all reads.
- ensure: the prevent writeskew problems. One of the differences compared to the clojure approach is that you also can block on ref that has been ‘ensured’ not to change. This behavior lifts on the pessimistic support. Writeskew prevention in Multiverse already is available on the transaction level btw.
- alter: to apply a function on a reference. Not very interesting itself. The big difference with Clojure is that there will not be any boxing.
The properties also are going to be exposed in Akka; so you can expect a lot of new functionality there. The focus for the 0.7 release will mainly be to provide a half fabricate that can be used in other Java projects without relying on instrumentation. This will probably not be completely up to date.
I’m also working on an experimental storage engine to make durable transactions possible. But I don’t think it will be complete for the 0.7 release.
July 25, 2010
It appears there is some confusion about these terms for STM design and I want to clarify them.
If a transaction tracks (so stores them in a local buffer) all reads that have been done, this is called read tracking. Some of the advantages of readtracking are:
- make certain forms of isolation possible: like causal consistency or protection against a write skew. It depends on the STM algorithm and settings what kind of support is needed. If the MVCC/TL2 algorithm is used, readtracking is needed less because the design always provides a read consistent view. Other algorithms like the one used in the SkySTM rely more on read tracking because the readset needs to be revalidated when the conflict counter changes (SkySTM specific). The big advantage of the SkySTM above the TL2 design is that there is less need for a central counter.
- provide certain blocking operations: like the ‘retry’ and the ‘orelse’. Based on all reads that have been done, a transaction can register some form of listener and blocks. Once a write on one of the reads has happened, the listening transactions is notified and can retry the execution of the transaction.
- reduce the number of read conflicts; once something is read and tracked (so stored in the buffer of the transaction), it doesn’t matter if another transaction does an update on that read, because the transaction can repeat the read from its local buffer. Especially for longer running transactions, or transaction that contend for certain datastructures, this could be an improvement.
- improved performance: doing a read can be expensive since some synchronization actions are needed. Once it is stored in the local buffer, these expensive operations are not needed anymore. If a read happens multiple times, it can be an advantage to track it, so the expensive operation only happens once.
The disadvantage of read tracking is that it will consume more memory and processing time. This problem is magnified when field granularity is used instead of object granularity since each fields needs to be tracked individually.
For certain STM algorithms there is a need for a transaction to be notified when a write has been done on something the transaction has read. This mechanism of a transaction registering to addresses it has read, and gets notified when a change happens on the read, is called a visible read. For the TL2 design this isn’t needed, since the global clock will make sure that a transaction always gets a read consistent view. For other STM implementations like the SkySTM having some form of visible read is needed, to make sure that transactions get notified when a write one something they have read have happened.
The disadvantage of ‘plain’ visible reads, is that some registration needs to be done one the reads and this can cause a lot of contention. If you have a tree for example, a lot of transactions will contend for the root node, even though it doesn’t change very often. The SkySTM works with semi visible reads, where a transaction only needs to know if there are any transactions that have not committed, but have executed that read. So instead of registering transactions, each read has a counter. Once a transaction does a read, it increases the counter and once it commits/aborts, it decreases the counter. If another transaction wants to do an update on a read with a surplus of readers, it knows that it could be causing a conflict and increases the global conflict counter. Other transaction will revalidate their readset if the conflict counter has changed when they do another read. The SkySTM even improves the design of the counter by using a scalable non zero indicator to reduce contention on the counter even more. In the new design of the Multiverse BetaSTM (will be part of Multiverse 0.7) I have taken a different approach (I allow a transactional object/ref to become read biased once a certain number of reads (and no updates) have been executed. Once it become read biased, it doesn’t track readers anymore and will always cause the global conflict counter to be increased when an update happens.
July 23, 2010
Normally STM’s are very optimistic, conflicting modifying transactions are able to execute concurrently but only one of them is able commit in the end. In most cases this is desired behavior, but in some cases it can be quite annoying (e.g. when doing IO) and you want to prevent that a transaction is going to conflict and a lot of work is lost.
In Multiverse 0.7 multiple levels of pessimistic behavior are going to be added:
- automatic lock all writes. This means that if you want update a transactional object, the locks on that object is either acquired or fails immediately. This looks a lot like the behavior MVCC databases like Oracle/Postgresql provide, although with these databases it still is possible to have concurrent reads and writes. With the new beta stm in Multiverse, this will be much harder to realize because the central clock, where MVCC/TL2 based STM rely on, has been removed because it limits scalability
- automatic lock all reads (so also all writes because they need to be read before being written).
- lock an individual object on reading (like the oracle select for update) or on writing for fine grained control
These properties will be accessible by setting the correct properties on the transaction or by doing a specific call to the API. So in essence one is now able to program using locks, but without the traditional problems:
- Locks are still composable
- race problems are not possible because locks are released to soon. Locks will not be released until the end of the transaction.
- race problems are not possible because locks are acquired too late. The transaction just fails with a conflict when such a transaction wants to commit
- locks can’t be forgotten to be released: locks will still be managed by the transaction and will automatically be released when the transaction aborts/commits
- it is not possible to run into deadlocks since a transaction can’t wait indefinitely on acquiring a lock
Atm there will only be support for exclusive locks, but read/write locks are probably going to be added in the 0.8 release. Although I’m still thinking about a good mechanism to upgrade shared locks to exclusive locks.
June 28, 2010
Software Transactional Memory makes certain problems easier to solve since there is a additional layer between the data itself and using the data. The standard problems STM’s solve is to coordinate concurrency, provide consistency and failure atomicity. But adding more ‘enterprisy’ logic isn’t that hard.
One of the things that would be quite easy to add is security; in some cases you want to control if someone is allowed to access a certain piece of information (either for reading or for writing). Most stm’s already provide readonly vs update transactions, and once executed in readonly mode a user is now allowed to modify information (although one should be careful that a readonly transaction is not ignored because there is en enclosing update transaction). This can easily be extended to provide some form of security where certain information can only be accessed if certain conditions are met.
Another related piece of functionality is adding encryption; in some cases you don’t want the data to be available unprotected in memory (someone could access it with a debugger for example) or placed on some durable storage or being distributed. Adding some form of encryption when committing and decryption when loading also would not be that hard to realize, since the transaction already exactly knows what data is used. Personally I don’t like to work with encryption since it very often is completely in your face and gets intertwined with the design, but using some declarative annotations, would remove a lot of pain.
June 28, 2010
Stm’s for object oriented languages like Java, c#, Scala etc can either be:
- field granular: meaning that each field of an object is managed individually
- object granular: meaning that all fields of an object are managed indivisible
Both approaches have some pro’s and con’s. The advantage of field granularity, is that independent fields of transactional object won’t cause conflicts. If you have a double linkedlist for example, you want the head and tail of the list to be accessed no matter what happens on the other side. This reduces the number of conflicts you get, so improves concurrency.
But having field granularity is not without its problems:
- more expensive transactions: because each field needs to be tracked individually
- more expensive loading; with a field granularity, each fields needs to be loaded individually.
- more expensive conflict detection; since each fields needs to be checked individually
- more expensive commit: each field needs to be locked/updated individually
This means that transactions require more cpu and memory. Another problem is that a transaction that uses field granularity takes longer and this means that it could suffer from more conflicts (causing reduced concurrency) and causing more conflicts (since they acquire locks longer). And last but not least, a field granular transaction puts a lot more pressure on your memory bus since more ‘synchronization’ actions are needed.
In Multiverse (when instrumentation is used), object granularity is the default. And to make field granularity possible, just place a @FieldGranular annotation on top of your field.
June 24, 2010
I’m currently working on writing a new STM engine for the Multiverse project and this time the focus is going to be on scalability and performance. The goal is that for uncontended data and 1 ref per transaction, the update performance should be a least 10 million per second and readonly transactions should be 50/75 million per second (all per core and it should scale linearly). So I’m going to write some posts about what I discovered while improving performance and this is the first one.
The first big thing is that profilers become useless (even if you use stack sampling) because the call stack often changes so frequently, that no information can be derived from it (even if you use a sampling frequency of 1 ms which apparently is the lowest you can get).
What works best for me is to throw away everything you have and start from scratch with an (incomplete) baseline. This helps to create some kind of ‘wow.. I wish I could get this performance’ and by adding more logic, you can see how expensive something is. In a lot of cases you will be flabbergasted how expensive certain constructs are; e.g. a volatile read or a polymorphic method call, or that the JIT is not able to optimize something you would expect it to optimize.
This is a very time consuming process (especially since it also depends on the platform or the jdk you are using). But it will help to gain a deeper insight and help you to write better performing code.