Simplifying enterprise applications with durable stm

March 21, 2010

One of the things I’m currently playing with is creating a durable Stm, functionality is going to be integrated in the main Stm implementation of the Multiverse STM. I’m doing a poc to figure out what kind of interfaces and changes the Multiverse STM is going to need.

The idea is that persistence for a lot of enterprisy applications I have seen just didn’t feel very comfortable:

  1. dealing with proxies; so problematic self calls in services and identity problems on entities
  2. dealing with or mapping
  3. dealing with setting up the db and creating ddl
  4. not being able to do dependency injection on entities
  5. objects getting an id once they are going to commit and not before

It takes up too much time and since STM already manages states of your objects, adding a storage-engine
behind is not that strange. To give you an idea how simple it could be, lets create a small Bank example:

@TransactionalObject
class Customer{
   	private final String id;

	public Customer(String id){
		return id;
	}

	public String getId(){
		return name;
	}
}

@TransactionalObject
class Account{
	private final Customer customer;
	private int balance;

	public Account(Customer customer){
		this.customer = customer;
	}

	public Customer getCustomer(){
		return customer;
	}

	public int getBalance(){
		return balance;
	}

	public void setBalance(int newBalance){
		if(newBalance<0){
			throw new NotEnoughMoneyException();
		}	

		this.balance = newBalance;
	}
}

@TransactionalObject
class Bank{
	private final TransactionalMap customers = 
		new TransactionalHashMap();
	private final TransactionalMap accounts = 
		new TransactionalHashMap();

	public Account find(String id){
		Account account = accounts.get(id);
		if(account == null){
			throw new AccountNotFoundException();
		}
		return account;
	}

	public void transfer(String fromId, String toId, int amount){
		if(amount<0){
			throw new IllegalArgumentException();
		}

		Account from = find(fromId);
		Account to = find(fromId);
		to.setBalance(to.getBalance()+amount);
		from.setBalance(from.getBalance()-amount);
	}

	public Customer createCustomer(){
		Customer customer = new Customer();
		customers.put(customer.getId(),customer);
		return customer;
	}

	public Accoount createAccount(Customer customer){
		Account found = accounts.get(customer.getId());
		if(found!=null){
			throw new AccountAlreadyExists();
		}
		Account newAccount = new Account(customer);
		accounts.put(customer.getId(), newAccount);
		return newAccount;
	}
}

And you could wire it up in Spring like this:

<bean id="bank" class="DurableTransactionalObjectFactoryBean">
         <property name="id" value="bank">
         <property name="class" value="Bank">
</bean>

And use this bean in your controllers for example. The DurableTransactionalObjectFactory will create or load a durable transactional object with the given id from disk.

As you can see there is:

  1. no or-mapping needed
  2. no dao’s needed
  3. no fuss with transaction configuration
  4. no need for rolling back state on objects manually, check the tranfer function where the setBalance on the from could thrown a NotEnoughMoneyException and normally could leave the to account in an illegal state

All operations on the bank are executed in full ACID nature.

I hope I’m able to add a experimental storage engine to Multiverse 0.6 (planned in 3 months) so I can learn more about what I’m going to need and what the exact semantics need to be for durable/non-durable objects and about their lifecycles.


Multiverse STM: Inferencing optimal transaction settingss

March 15, 2010

In Multiverse there are a lot of parameters that can activate certain functionality on the transaction. Parameters like:

  1. readonly
  2. interruptible
  3. automaticReadTracking (needed for blocking operations)
  4. maxRetryCount
  5. allowWriteSkewProblem

Each additional feature causes overhead on the STM because it takes longer to process a transaction and it also leads to increased memory usage. In Multiverse 0.4 I already added some learning functionality to the STM runtime that makes speculative choices and adapts if a choice didn’t work out well. I use this for selecting a good performing transaction implementation; based on how many reads/writes the transaction encountered the previous time the system is able to select a better fitting transaction next time.

There is no reason why not to inference optimal settings; just start with readonly and no automatic readtracking. If an update is done, the system knows that next time an update transaction is needed. The same goes for a retry; it a retry is done and automaticReadTracking is not enabled, the next transaction can run with this feature enabled. Perhaps similar techniques could be used to determine if readtracking is useful because a transaction runs often in read conflicts.

David Dice already wrote about this in his famous TL2 paper and I certainly think that it is going to save developers a lot of time. And if someone wants to fiddle with settings themselves, they can always override settings (or perhaps even deactivating the learning system). This is certainly something I want to integrate in one of the next Multiverse releases.


Multiverse performance improvements

March 14, 2010

Saterday I was talking to Jonas Boner of the Akka project, which uses Multiverse as STM implementation. Since Multiverse 0.4 doesn’t support static instrumentation and the Akka project doesn’t want to rely on a Javaagent, in Multiverse 0.3 a manual instrumented reference was added. I never spend any time optimizing it, so I decided to have a closer inspection since the Akka project relies on it.

One of the biggest performance gains was for reading a value from a transactional reference without an existing transaction, this went op from 7.5M transactions/second to 160M transactions/second using a single thread. The main part of this improvement was the inlining of the transaction and the transactiontemplate. This reduces object creation and makes a ref.get() almost just as expensive as a AtomicReference.get. So developers should not be worried about using that instead of an AtomicReference if they also are working with transactions.

Another performance improvement, although less spectacular, was on writing a value to a transactional reference using its own transaction, that went up from 5M to 12M transactions per seconds on a single thread. This was also mostly caused by inlining the transaction. I already have experimented with reducing the number of cas operations and pushing it to 15M is possible. So I’m looking at how to integrate that. The goal is that Multiverse can have the performance of CAS for CAS transactions but also can participate in fullblown transactions (that are of course more expensive and therefore slower).

Inlining the transaction certainly is something that is going to be add to the other instrumentation optimizations in the future (every release I try to add a few). The improved manual instrumented reference (AlphaRef) is going to be added to the 0.4.2 release planned for this week (somehow still struggling with the maven release plugin). It also is going to gain some other goodies like an optimistic lock that can be used spanning multiple transactions (comparable to what you normally do with optimistic locking using a database).

For the 0.4.3 release I’ll add a history of dynamic length to the manual instrumented reference (in Oracle this is done using the undolog and in Clojure this functionality is called “adaptive history queues”) to give Jonas his persistent datastructures. This is a present of the Multiversion Concurrency Control design which makes this possible (essentially TL2 is MVCC with a single length history). For Multiverse I’m planning a persistent history queue, where you even have control on the number of old versions that need to be maintained, but sadly enough not planned for the 0.5 release (I also have a full-time job).


Multiverse STM as Java Database

March 2, 2010

Last week I have released Multiverse 0.4 with a lot of new goodies and a completely new website. In the 0.5 also a lot of new goodies will be added (compile-time instrumentation, more transactional datastructures, performance improvements). But I’m also thinking about longer term goals and one of the most interesting ones (I think) is persistence (next to distribution). Since STM already deals with concurrency control, knows about the internals of your objects and is build on transactions, persistence is a logical next step.

The goal is to make it as easy as possible for the Java Developer (perhaps just setting a durability property on true on some annotation), so no need to deal with a relational database, no or-mapping, no schema, no need to deal with caching (should all be done behind the screens). I have seen too many projects wasting lots and lots of resources because:

  1. performance completely sucks (doing a few hundred operations per second with a lot of hardware and expensive software like Oracle RAC). Adding more hardware doesn’t solve the problem and that is where the expensive consultants come in (like I used to do).
  2. a lot of concurrency issues because developers don’t understand the database. So you get your traditional race problems and deadlocks, but you also get a lot of application complexity caused by additional ‘fixing’ logic. Things get even more complicated if a different database is used in production than for development, meaning that you need to deal with problems twice.
  3. needing to deal with the mismatch between the relational and object oriented datamodel
  4. needing to deal with limitations caused by traditional databases. One of the limitations that causes a lot of complexity is the lack of blocking, so some kind of polling mechanism usually is added (Quartz for example) that triggers the logic. This makes very simple things (like a thread executing work) much more cumbersome than it needs to be.
  5. needing to deal with imperfect solutions like proxy based Transactions (Spring for example). The big disadvantage of this approach is that the design of classes gets much more complicated when self calls are done. Another similar problem is see is that DDD (so objects with logics and therefore needing dependencies) is nice on paper, but sucks in practice because dependencies can’t be injected into entities. I know that Spring has some support for it, but often it can’t or is not used. The whole dependency issue in entities (and other objects that are not created inside a container) could have been solved very easily by Hibernate by exposing a factory. So that is something I am going to do right.

I wish Multiverse was available all the times I have seen these issues at customers because it would have made programming more fun and saved lots of money; try to imagine what it costs when a team of developers struggles for days/weeks with these issues or for even larger periods systematically wasting time because the road they have taken is just too complex. I’m not claiming that it will be a fit for every project, but if you just need an easy mechanism to persist Java state and deal with concurrency, Multiverse could be a good solution.