STM: 160k transactions/second

For the last 4 months I have been working on a Java based STM implementation. With 2 main priorities:

  1. predictability (especially the concurrency control part)
  2. performance

One of the performance test is a simple producer consumer problem with 2 threads and with a stack in between and with my dual core laptop I get roughly 160.000 transactions per second. If I execute a similar test with classic concurrency control and a Stack (a Deque to be honest because a java.util.Stack has no blocking support even though it is thread-safe) I get roughly 5.000.000 transactions per second. So there is quite some room for improvement.

6 Responses to STM: 160k transactions/second

  1. I wonder how that performance compares to the Clojure STM.

    • pveentjer says:

      It would be a nice comparison, but it can be quite hard to create benchmarks that can be compared πŸ™‚

      But it certainly would be nice to know. Especially since Clojure also uses MVCC as concurrency control mechanism.

      And a comparison between Multiverse and Deuc/XSTM (other Java STM implementations) would be nice.

  2. Fabien says:

    Is your test available in google code svn? If yes which file is it?

  3. Arjen van Schie says:

    Nice to hear you still work on that project, and 160K per second sounds like a lot. Is it on a perfect scenario (no collisions) or is there a certain % of operations that need a retry (caused by a collision)

  4. pveentjer says:

    Hi Arjan,

    it is not a perfect scenario πŸ™‚ The producers and consumers share a stack, so there is a lot of contention for the head node and this leads to a lot of write conflicts since the system only supports optimistic locking (for now).

    I have roughly between 35/40% write conflicts. These transactions need to be retried, so removal of these write conflicts could increase performance quite a bit.

    stm.transaction.activecount: 0
    stm.transaction.startedcount: 6422568
    stm.transaction.committedcount: 3999186
    stm.transaction.committed-percentage: 62.26770973853449
    stm.transaction.abortedcount: 2423382
    stm.transaction.aborted-percentage: 37.73229026146551
    stm.transaction.readonlycount: 0 6358590
    heap.commit.readonly 0
    heap.commit.readonly-percentage 0.0
    heap.commit.writeconflicts 2359404
    heap.commit.writeconflict-percentage 37.10577344977424
    heap.commit.succeeded 3999186
    heap.commit.nonblocking.enter.count 6358590
    heap.commit.nonblocking.failure.count 1521691
    heap.commit.nonblocking.failure.percentage 19.310111
    heap.listen.nonblocking.enter.count 63978
    heap.listen.nonblocking.failure.count 37169
    heap.listen.nonblocking.failure.percentage 36.747506
    heap.reads 6422567
    heap.stores 3999186

    79012 chain alivecount
    2000000 items took 11147 ms
    358767.9196196286 transactions/second

    And I made a mistake calculating the number of transactions. A put and a take are 2 transactions, and not 1. So my the performance is increased by a factor 2. The problem is that the performance of the classic stack approach also increases with a factory 2 πŸ™‚

    A queue (with a seperate head and tail), although you still have contention on the head/tail nodes would be a better solution. But the goal is not to create a fast producer/consumer solution, but a fast STM.

  5. pveentjer says:

    I think 1.000.000 transactions per second is possible. If I look at the system with a profiler, there is a lot of room for improvement on certain structures and there also are a lot of non blocking failures going on. So a lot of useless work (burning cpu cycles) here as well.

    But the performance really depends on what you are trying to do. Read only transaction by design have no impact on the performance of the STM since they work from an already available stable snapshot of the heap.

    The more writes a transaction needs to do, or the more non final reads it does in case of a retry (functionality needed for the stm version of condition variables), or the larger the number of new stm objects (not updates) written the the stm (because of rebalancing in the heap), really influence performance.

    So 1.000.000 transactions per second doesn’t say very much. And it also doesn’t say anything about scalability πŸ™‚ I’m going to get my hands on a Intel Core i7, so I can scale up to at least 8 threads.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: