STM: Read tracking vs visibile reads.

July 25, 2010

It appears there is some confusion about these terms for STM design and I want to clarify them.

Read tracking

If a transaction tracks (so stores them in a local buffer) all reads that have been done, this is called read tracking. Some of the advantages of readtracking are:

  1. make certain forms of isolation possible: like causal consistency or protection against a write skew. It depends on the STM algorithm and settings what kind of support is needed. If the MVCC/TL2 algorithm is used, readtracking is needed less because the design always provides a read consistent view. Other algorithms like the one used in the SkySTM rely more on read tracking because the readset needs to be revalidated when the conflict counter changes (SkySTM specific). The big advantage of the SkySTM above the TL2 design is that there is less need for a central counter.
  2. provide certain blocking operations: like the ‘retry’ and the ‘orelse’. Based on all reads that have been done, a transaction can register some form of listener and blocks. Once a write on one of the reads has happened, the listening transactions is notified and can retry the execution of the transaction.
  3. reduce the number of read conflicts; once something is read and tracked (so stored in the buffer of the transaction), it doesn’t matter if another transaction does an update on that read, because the transaction can repeat the read from its local buffer. Especially for longer running transactions, or transaction that contend for certain datastructures, this could be an improvement.
  4. improved performance: doing a read can be expensive since some synchronization actions are needed. Once it is stored in the local buffer, these expensive operations are not needed anymore. If a read happens multiple times, it can be an advantage to track it, so the expensive operation only happens once.

The disadvantage of read tracking is that it will consume more memory and processing time. This problem is magnified when field granularity is used instead of object granularity since each fields needs to be tracked individually.

Visible reads

For certain STM algorithms there is a need for a transaction to be notified when a write has been done on something the transaction has read. This mechanism of a transaction registering to addresses it has read, and gets notified when a change happens on the read, is called a visible read. For the TL2 design this isn’t needed, since the global clock will make sure that a transaction always gets a read consistent view. For other STM implementations like the SkySTM having some form of visible read is needed, to make sure that transactions get notified when a write one something they have read have happened.

The disadvantage of ‘plain’ visible reads, is that some registration needs to be done one the reads and this can cause a lot of contention. If you have a tree for example, a lot of transactions will contend for the root node, even though it doesn’t change very often. The SkySTM works with semi visible reads, where a transaction only needs to know if there are any transactions that have not committed, but have executed that read. So instead of registering transactions, each read has a counter. Once a transaction does a read, it increases the counter and once it commits/aborts, it decreases the counter. If another transaction wants to do an update on a read with a surplus of readers, it knows that it could be causing a conflict and increases the global conflict counter. Other transaction will revalidate their readset if the conflict counter has changed when they do another read. The SkySTM even improves the design of the counter by using a scalable non zero indicator to reduce contention on the counter even more. In the new design of the Multiverse BetaSTM (will be part of Multiverse 0.7) I have taken a different approach (I allow a transactional object/ref to become read biased once a certain number of reads (and no updates) have been executed. Once it become read biased, it doesn’t track readers anymore and will always cause the global conflict counter to be increased when an update happens.

STM and pessimistic behavior

July 23, 2010

Normally STM’s are very optimistic, conflicting modifying transactions are able to execute concurrently but only one of them is able commit in the end. In most cases this is desired behavior, but in some cases it can be quite annoying (e.g. when doing IO) and you want to prevent that a transaction is going to conflict and a lot of work is lost.

In Multiverse 0.7 multiple levels of pessimistic behavior are going to be added:

  1. automatic lock all writes. This means that if you want update a transactional object, the locks on that object is either acquired or fails immediately. This looks a lot like the behavior MVCC databases like Oracle/Postgresql provide, although with these databases it still is possible to have concurrent reads and writes. With the new beta stm in Multiverse, this will be much harder to realize because the central clock, where MVCC/TL2 based STM rely on, has been removed because it limits scalability
  2. automatic lock all reads (so also all writes because they need to be read before being written).
  3. lock an individual object on reading (like the oracle select for update) or on writing for fine grained control

These properties will be accessible by setting the correct properties on the transaction or by doing a specific call to the API. So in essence one is now able to program using locks, but without the traditional problems:

  1. Locks are still composable
  2. race problems are not possible because locks are released to soon. Locks will not be released until the end of the transaction.
  3. race problems are not possible because locks are acquired too late. The transaction just fails with a conflict when such a transaction wants to commit
  4. locks can’t be forgotten to be released: locks will still be managed by the transaction and will automatically be released when the transaction aborts/commits
  5. it is not possible to run into deadlocks since a transaction can’t wait indefinitely on acquiring a lock

Atm there will only be support for exclusive locks, but read/write locks are probably going to be added in the 0.8 release. Although I’m still thinking about a good mechanism to upgrade shared locks to exclusive locks.