Simplifying enterprise applications with durable stm

March 21, 2010

One of the things I’m currently playing with is creating a durable Stm, functionality is going to be integrated in the main Stm implementation of the Multiverse STM. I’m doing a poc to figure out what kind of interfaces and changes the Multiverse STM is going to need.

The idea is that persistence for a lot of enterprisy applications I have seen just didn’t feel very comfortable:

  1. dealing with proxies; so problematic self calls in services and identity problems on entities
  2. dealing with or mapping
  3. dealing with setting up the db and creating ddl
  4. not being able to do dependency injection on entities
  5. objects getting an id once they are going to commit and not before

It takes up too much time and since STM already manages states of your objects, adding a storage-engine
behind is not that strange. To give you an idea how simple it could be, lets create a small Bank example:

@TransactionalObject
class Customer{
   	private final String id;

	public Customer(String id){
		return id;
	}

	public String getId(){
		return name;
	}
}

@TransactionalObject
class Account{
	private final Customer customer;
	private int balance;

	public Account(Customer customer){
		this.customer = customer;
	}

	public Customer getCustomer(){
		return customer;
	}

	public int getBalance(){
		return balance;
	}

	public void setBalance(int newBalance){
		if(newBalance<0){
			throw new NotEnoughMoneyException();
		}	

		this.balance = newBalance;
	}
}

@TransactionalObject
class Bank{
	private final TransactionalMap customers = 
		new TransactionalHashMap();
	private final TransactionalMap accounts = 
		new TransactionalHashMap();

	public Account find(String id){
		Account account = accounts.get(id);
		if(account == null){
			throw new AccountNotFoundException();
		}
		return account;
	}

	public void transfer(String fromId, String toId, int amount){
		if(amount<0){
			throw new IllegalArgumentException();
		}

		Account from = find(fromId);
		Account to = find(fromId);
		to.setBalance(to.getBalance()+amount);
		from.setBalance(from.getBalance()-amount);
	}

	public Customer createCustomer(){
		Customer customer = new Customer();
		customers.put(customer.getId(),customer);
		return customer;
	}

	public Accoount createAccount(Customer customer){
		Account found = accounts.get(customer.getId());
		if(found!=null){
			throw new AccountAlreadyExists();
		}
		Account newAccount = new Account(customer);
		accounts.put(customer.getId(), newAccount);
		return newAccount;
	}
}

And you could wire it up in Spring like this:

<bean id="bank" class="DurableTransactionalObjectFactoryBean">
         <property name="id" value="bank">
         <property name="class" value="Bank">
</bean>

And use this bean in your controllers for example. The DurableTransactionalObjectFactory will create or load a durable transactional object with the given id from disk.

As you can see there is:

  1. no or-mapping needed
  2. no dao’s needed
  3. no fuss with transaction configuration
  4. no need for rolling back state on objects manually, check the tranfer function where the setBalance on the from could thrown a NotEnoughMoneyException and normally could leave the to account in an illegal state

All operations on the bank are executed in full ACID nature.

I hope I’m able to add a experimental storage engine to Multiverse 0.6 (planned in 3 months) so I can learn more about what I’m going to need and what the exact semantics need to be for durable/non-durable objects and about their lifecycles.


Multiverse STM as Java Database

March 2, 2010

Last week I have released Multiverse 0.4 with a lot of new goodies and a completely new website. In the 0.5 also a lot of new goodies will be added (compile-time instrumentation, more transactional datastructures, performance improvements). But I’m also thinking about longer term goals and one of the most interesting ones (I think) is persistence (next to distribution). Since STM already deals with concurrency control, knows about the internals of your objects and is build on transactions, persistence is a logical next step.

The goal is to make it as easy as possible for the Java Developer (perhaps just setting a durability property on true on some annotation), so no need to deal with a relational database, no or-mapping, no schema, no need to deal with caching (should all be done behind the screens). I have seen too many projects wasting lots and lots of resources because:

  1. performance completely sucks (doing a few hundred operations per second with a lot of hardware and expensive software like Oracle RAC). Adding more hardware doesn’t solve the problem and that is where the expensive consultants come in (like I used to do).
  2. a lot of concurrency issues because developers don’t understand the database. So you get your traditional race problems and deadlocks, but you also get a lot of application complexity caused by additional ‘fixing’ logic. Things get even more complicated if a different database is used in production than for development, meaning that you need to deal with problems twice.
  3. needing to deal with the mismatch between the relational and object oriented datamodel
  4. needing to deal with limitations caused by traditional databases. One of the limitations that causes a lot of complexity is the lack of blocking, so some kind of polling mechanism usually is added (Quartz for example) that triggers the logic. This makes very simple things (like a thread executing work) much more cumbersome than it needs to be.
  5. needing to deal with imperfect solutions like proxy based Transactions (Spring for example). The big disadvantage of this approach is that the design of classes gets much more complicated when self calls are done. Another similar problem is see is that DDD (so objects with logics and therefore needing dependencies) is nice on paper, but sucks in practice because dependencies can’t be injected into entities. I know that Spring has some support for it, but often it can’t or is not used. The whole dependency issue in entities (and other objects that are not created inside a container) could have been solved very easily by Hibernate by exposing a factory. So that is something I am going to do right.

I wish Multiverse was available all the times I have seen these issues at customers because it would have made programming more fun and saved lots of money; try to imagine what it costs when a team of developers struggles for days/weeks with these issues or for even larger periods systematically wasting time because the road they have taken is just too complex. I’m not claiming that it will be a fit for every project, but if you just need an easy mechanism to persist Java state and deal with concurrency, Multiverse could be a good solution.


Am I too stupid for @Autowired?

December 2, 2009

When I started with Spring, I finally had the feeling that I could write enterprise application in the same powerful way as I could write normal systems. And I have written quite complex Java stuff like expertsystems, prolog compilers and various other compilers and interpreters and currently working on Multiverse; a software transactional memory implementation. I also have written quite a lot of traditional enterprisy Spring applications (frontend, backend/batch processing). So I consider myself at least as smart as most developers.

The first applicationcontext files felt a littlebit strange, finding the balance between the objects that need to be created inside the application context and objects created in the Java objects themselves. But it didn’t took me long to find this balance and realise how powerful the application context is:

  1. it acts like a executable piece of software documentation where I can see how a system works, just by looking at a few xml files. I don’t need to look in the source to see how stuff is wired.
    I compare it with looking at Lego instructions, by looking at the instructions I can see what is going to be build (not that I ever followed the design). To understand what is build, I don’t need to look at the individual parts.
  2. separation of interfaces and implementation, so testing and using a lot of cool oo design stuff is a dream
  3. having the freedom to inject different instances of the same interface on different locations. This makes it possible to do cool stuff like adding proxies, logging etc etc. I wasn’t forced to jump through hoops any more because of all kinds of container specific constraints

With the old versions of Spring (1.2 series) I had the feeling of total control and complete understanding.

The application context made me aware that I needed to work with 2 hats:

  1. ‘Object’ designer; so actually writing the Java code and documentation
  2. ‘Object’ integrator; where I assembly complete systems based on the objects

And I loved it.. I have more than 1.000 post on the Spring forum just because I believe in this approach.

But if I look at a lot of modern Spring applications, filled with @Component and @Autowired annotations, it feels like I have lost it all. It takes me a lot longer to realise how something works, even though I have a great ide (newest beta of IntelliJ) with perfect Spring integration that makes finding dependencies a lot easier. So I keep on jumping from file to file to understand the big picture. A lot of developers I meet/work with think that the auto-wiring functionality is great because it saves them a lot of time and prevents them from programming in XML (XML was a bad choice, but that is another discussion), but somehow my brain just doesn’t get it.

So my big questions are:

  1. am I too stupid?
  2. if I’m going to try it longer, am I going to see the light?
  3. do I expect too much? So because less characters need to be typed, is it inevitable that some redundant/explicit information is lost
  4. is this some hype and eventually people start to realise that it was a nice experiment, but doesn’t provide the value it promises (just like checked exceptions or EJB 2). And in 10 years we can all laugh about it.

2 Types of Dao’s.

November 2, 2009

From time to time I hear the discussion about how easy it is to replace dao implementations. But this isn’t as easy as it sounds because there are 2 types of dao’s:

  1. session-less dao’s; e.g. an JDBC, IBatis or Gigaspaces implementation
  2. session-based dao; e.g. a Hibernate/JPA implementation

One big difference between these 2 is that with a session-based dao, it doesn’t matter if an object is changed after a save/update has been called because the object is attached to the session. When the transaction commits it will commit the last state of the object because it can access these objects from the session. With a session-less dao’s, only the state of the object when the save/update was called will be persisted. Personally I prefer to name the save/update methods of a session-based dao ‘attach’ because it makes clear what is really going on. And loaded objects are attached by default, so no need to call save/update on them.

Another big difference between these 2 is that with a session-based dao, you will get the same object instance within a transaction for some entity. So you don’t need to worry about accessing it more than once within a transaction. With session-less dao’s, you can get a new instance within a single transaction, every time you do a load. So you need to take care to update the correct instance.

So even though dao’s look easy to replace because they have an interface, there is a lot of ‘undocumented’ behaviour that makes it a lot trickier than expected.


Cutting costs: Layering good, tiers bad

March 17, 2009

From time to time I see old-school software architectures at customers where all the layers of the applications (the most common ones are ui, business logic, persistence) are placed in separate tiers (different machines). There are a lot of problems with this approach:

  1. increased latency because of all the marshalling, marshalling and network communication.
  2. increased development time because of all the marshalling/unmarshalling code (so DTO’s, remote and local interfaces etc).
  3. increased hardware costs: often you see as much front end machines, as backend machines (that don’t do much essential stuff, but spend a lot of time waiting and marshalling/unmarshalling). So instead of 2×2 machines for example, you have more uptime with 3×1 machines and lower hardware costs.
  4. increased license costs since you need to run extra servers
  5. increased maintenance costs since you need to maintain more machines
  6. increased infrastructure costs since there are more machines to connect
  7. increased troubleshooting costs: in these environments it is much harder to figure out what is going wrong because there is so much communication going on.

So instead of hosting the layers on different machines, it often imho is better to host those layers on the same hardware or even on the same JVM. It doesn’t mean this is the only approach, but I have seen too many applications where this useless separation of tiers has lead to a lot of extra cost. So it certainly is something worth considering if you need to cut costs in these financial difficult times.


HashMap is not a thread-safe structure.

May 26, 2008

Last few months I have seen too much code where a HashMap (without any extra synchronization) is used instead of a thread-safe alternative like the ConcurrentHashMap or the less concurrent but still thread-safe HashTable. This is an example of a HashMap used in a home grown cache (used in a multi-threaded environment):

interface ValueProvider{
	V retrieve(K key);
}

public class SomeCache{

	private Map map = new HashMap();
	private ValueProvider valueProvider;

	public SomeCache(ValueProvider valueProvider){
		this.valueProvider = valueProvider;
	}

	public V getValue(K key){
		V value = map.get(key);
		if(value == null){
			value = valueProvider.get(key);
			if(value!=null)
				map.put(key,value);		
		}	
		return value;
	} 
} 

There is much wrong with this innocent looking piece of code. There is no happens before relation between the put of the value in the map, and the get of the value. This means that a thread that receives the value from the cache, doesn’t need to see all fields if the value has publication problems (most non thread-safe structures have publication problems). The same goes for the value and the internals (the buckets for example) of the HashMap. This means that updates to the internals of the HashMap while putting, don’t need to be visible to a thread that does the get.
So it could be that the state of the cache in main memory is not in an allowed state (some of the changes maybe are stuck in the cpu-cache), and the cache could start behaving erroneous and if you are lucky, it starts throwing exceptions. And last, but certainly not least, there also is a classic race problem: if 2 threads do a interleaved map.put, the internals of the HashMap can get in an inconsistent state. In most cases an application reboot/redploy would be the only way to fix this problem.

There are other problems with the cache behavior of this code as well. The items don’t have a timeout, so once a value gets in the cache, it stays in the cache. In practice this could lead to a webpage that keeps displaying some value, even though in the main repository the value has been updated. An application reboot/redeploy also is the only way to solve this problem. Using a Common Of The Shelf (COTS) cache would be a much saver solution, even though a new library needs to be added.

It is important to realize that a HashMap can be used perfectly in a multi-threaded environment if extra synchronization is added. But without extra synchronization, it is a time-bomb waiting to go off.


Executing long running task from UI

December 15, 2007

A colleague asked me how one could prevent the execution a long running task on some UI thread (e.g. the thread of a Servlet container or the Swing event dispatching thread) and also how one could prevent the concurrent execution of that same task. So I send him an email containing a small example and decided to place it on my blog to help others struggling with the same issue.

The long running task

The FooService is the service with the long running method ‘foo’.

interface FooService{
        void foo();
}

There is no need to add threading logic to the FooService implementation. It can focus purely on the realization of the business process by implementing the business logic. The code should not be infected with concurrency control logic, because it makes testing very hard, and makes code also hard to understand, reuse or change. So this is one of the first potential refactoring I often see in code. I’ll post more about this in ‘Java Concurrency Top 10’.

The execution-service

The FooExecutionService is responsible for executing the FooService and preventing concurrent execution (if the correctly configured executor instance is injected). Personally I prefer to inject the executor instead creating one inside the FooExecutionService, because it makes it hard to test and change/configure.

class FooExecutionService{

        private final FooService fooService;
        private final Executor executor;

	public FooExecutorService(FooService fooService, Executor executor){
		this.fooService = fooService;
		this.executor = executor;
	}

        /**
	 * Starts executing the FooService. This call is asynchronous, so
         * won't block.
	 *
	 * @throws RejectedExecutionException if the execution 
	 *         of foo is not accepted (concurrent/shutdown).
	 */
	public void start(){
                executor.execute(new Task());
        }

        class Task implements Runnable{
                void run(){
                        try{
                                fooService.foo();
                        catch(Exception ex){
                                log.error("failed to ...", ex);
                        }
                }
        }
}

The FooExecutionService could be improved in different ways: it could provide information when a task already is executing. This could be realized by placing a dummy task in the executor and check if the task is rejected. A different solution would be to let the Task publish some information about the status of the current execution. If the task is very long running, and you want to be able to stop the task, you could shutdown the executor by calling the shutdownNow method. This interrupts the worker-threads and if you periodically check the interrupt status of the executing thread while doing to long running call, you can end the execution.

Some Spring configuration

The Executor is injected from the outside by some Spring configuration, i.e.:

<bean id=""fooService" class="FooServiceImpl"/>

<bean id="fooExecutionService" class="FooExecutionService">
        <constructor-arg 	index="0" 
				ref="fooService"/>
        <constructor-arg 	index="1">
            	<bean 	class="java.util.concurrent.ThreadPoolExecutor"
			destroy-method="shutdownNow">
			<!-- minimal poolsize (only 1 thread) -->
                	<constructor-arg 	index="0"
                                 		value="1"/>
                	<!-- maximum poolsize (only 1 thread)-->
                	<constructor-arg 	index="1"
                                 		value="1"/>
                	<!-- the timeout (we don't need it) -->
                	<constructor-arg 	index="2"
                                 		value="0"/>
                	<!-- the timeunit that belongs to the timeout argument (we don't need it) -->
                	<constructor-arg index="3">
                    		<bean 	id="java.util.concurrent.TimeUnit.SECONDS"
                          		class="org.springframework.beans.factory.config.FieldRetrievingFactoryBean"/>
                	</constructor-arg>
                	<!-- the workqueue where unprocessed tasks get stored -->
                	<constructor-arg index="4">
                    		<!-- we don't want any unprocessed work: a worker needs to be available,
                 		or the task gets rejected. -->
                    		<bean class="java.util.concurrent.SynchronousQueue"/>
                	</constructor-arg>
        	</bean>
	</constructor-arg>    
</bean>

If there are multiple long running methods, it would be an idea to extract the creational logic of the executor to a factory method.

The UI-controller

And the FooExecutionService can be hooked up to some controller like this:

class StartFooController extends SomeController{
        final FooExecutionService fooExecutionService;
      	
	StartFooController(FooExecutionService fooExecutionService){
		this.fooExecutionService = fooExecuctionService;
	} 
	
        String handleRequest(Request request, Response response){
                try{
                        fooExecutionService.start();
                        return "success";
                }catch(RejectedExecutionException ex){
                        return "alreadyrunningorshuttingdownview";
                }
        }
}

Prevention of concurrent execution of different tasks

If you want to prevent concurrent execution of different long running methods, you could create a single execution-service for all methods, and share the same executor between the execution of the different tasks: