STM: Read tracking vs visibile reads.

It appears there is some confusion about these terms for STM design and I want to clarify them.

Read tracking

If a transaction tracks (so stores them in a local buffer) all reads that have been done, this is called read tracking. Some of the advantages of readtracking are:

  1. make certain forms of isolation possible: like causal consistency or protection against a write skew. It depends on the STM algorithm and settings what kind of support is needed. If the MVCC/TL2 algorithm is used, readtracking is needed less because the design always provides a read consistent view. Other algorithms like the one used in the SkySTM rely more on read tracking because the readset needs to be revalidated when the conflict counter changes (SkySTM specific). The big advantage of the SkySTM above the TL2 design is that there is less need for a central counter.
  2. provide certain blocking operations: like the ‘retry’ and the ‘orelse’. Based on all reads that have been done, a transaction can register some form of listener and blocks. Once a write on one of the reads has happened, the listening transactions is notified and can retry the execution of the transaction.
  3. reduce the number of read conflicts; once something is read and tracked (so stored in the buffer of the transaction), it doesn’t matter if another transaction does an update on that read, because the transaction can repeat the read from its local buffer. Especially for longer running transactions, or transaction that contend for certain datastructures, this could be an improvement.
  4. improved performance: doing a read can be expensive since some synchronization actions are needed. Once it is stored in the local buffer, these expensive operations are not needed anymore. If a read happens multiple times, it can be an advantage to track it, so the expensive operation only happens once.

The disadvantage of read tracking is that it will consume more memory and processing time. This problem is magnified when field granularity is used instead of object granularity since each fields needs to be tracked individually.

Visible reads

For certain STM algorithms there is a need for a transaction to be notified when a write has been done on something the transaction has read. This mechanism of a transaction registering to addresses it has read, and gets notified when a change happens on the read, is called a visible read. For the TL2 design this isn’t needed, since the global clock will make sure that a transaction always gets a read consistent view. For other STM implementations like the SkySTM having some form of visible read is needed, to make sure that transactions get notified when a write one something they have read have happened.

The disadvantage of ‘plain’ visible reads, is that some registration needs to be done one the reads and this can cause a lot of contention. If you have a tree for example, a lot of transactions will contend for the root node, even though it doesn’t change very often. The SkySTM works with semi visible reads, where a transaction only needs to know if there are any transactions that have not committed, but have executed that read. So instead of registering transactions, each read has a counter. Once a transaction does a read, it increases the counter and once it commits/aborts, it decreases the counter. If another transaction wants to do an update on a read with a surplus of readers, it knows that it could be causing a conflict and increases the global conflict counter. Other transaction will revalidate their readset if the conflict counter has changed when they do another read. The SkySTM even improves the design of the counter by using a scalable non zero indicator to reduce contention on the counter even more. In the new design of the Multiverse BetaSTM (will be part of Multiverse 0.7) I have taken a different approach (I allow a transactional object/ref to become read biased once a certain number of reads (and no updates) have been executed. Once it become read biased, it doesn’t track readers anymore and will always cause the global conflict counter to be increased when an update happens.

3 Responses to STM: Read tracking vs visibile reads.

  1. Andrew Phillips says:

    Does the ‘visible read’ requirement cause any design conflicts for read tracking, e.g. does it (in your experience) make read tracking uneconomical?

    At an abstract level they would seem to be somewhat opposed – read tracking attempts to keep reads local and reduce the amount of accesses to the actual object, whilst visible reads require a “link” between the transaction’s copy of the read varible and the actual object (so that it can track writers).

  2. pveentjer says:

    In most cases when you do visible reads, you also need some form of read tracking because when a transaction aborts/commits, it should also notify the reads from its departure.

    I don’t see a visible read as a direct requirement; it is a means to a goal. In my case I use it for object recycling, retry, writeskew protection and for providing a transaction level read consistency. If object granularity is used, read tracking becomes less expensive (since less state needs to be tracked), and with a good transaction design you can come a long way (in the new betastm I use an array to store a binary search tree for reads.. so very low garbage production.. even the arrays are pooled). But.. tracking reads can be more expensive than not tracking them 🙂

    It depends on the STM design and features needed, if it needed. I also have implemented a non read tracking transaction for the beta stm, but it will fail on conflicts even though there are no real conflicts because it has no means to revalidate its readset.

  3. Andrew Phillips says:

    > also have implemented a non read tracking transaction
    > for the beta stm, but it will fail on conflicts even
    > though there are no real conflicts because it has no
    > means to revalidate its readset.

    Could you give an example? A read-write conflict where the value has not changed? Or..?

Leave a comment