I’m currently working on removing unnecessary method calls (there is no method as fast as a method that is not called) and I have been able to boost performance on one of the transactional collection classes a few hundred percent and there still is a lot more to gain.
But there is more low hanging fruit waiting to be picked; if there is a getter or setter called on a transactional object, the transaction normally being used to execute these methods, can completely be optimized away be an extremely well performing solution. Getters will have the performance of one threadlocal read and one volatile read. The setters are a little bit more complicated because:
- they still need to acquire/release a lock (so at least one cas and one volatile write)
- it also needs one volatile write for storing the content
- and the biggest influence: increasing a shared clock: an AtomicLong.
I expect that writes will be in the 8+ Million transaction/second (125 ns/write) and around 80 million transactions/second (12.5 ns/read) on a single core. Another thing that also needs to be tested more extensively is how well this is going to scale because that shared counter is going to put a lot of pressure on cpu cache. The atomic writes can be made cheaper by relaxing the increment of the clock, if concurrent threads are increasing that clock; if one of them succeeds, the other is all right with that.
This functionality is also going to be integrated in the transactional reference support for the Akka project (for the programmatic support the threadlocal read for the transaction can even be bypassed in the get and set).
The relaxed clock optimization is explained in the TL2 paper of David Dice and was added to Multiverse quite some time ago (is configurable).