Simplifying enterprise applications with durable stm

March 21, 2010

One of the things I’m currently playing with is creating a durable Stm, functionality is going to be integrated in the main Stm implementation of the Multiverse STM. I’m doing a poc to figure out what kind of interfaces and changes the Multiverse STM is going to need.

The idea is that persistence for a lot of enterprisy applications I have seen just didn’t feel very comfortable:

  1. dealing with proxies; so problematic self calls in services and identity problems on entities
  2. dealing with or mapping
  3. dealing with setting up the db and creating ddl
  4. not being able to do dependency injection on entities
  5. objects getting an id once they are going to commit and not before

It takes up too much time and since STM already manages states of your objects, adding a storage-engine
behind is not that strange. To give you an idea how simple it could be, lets create a small Bank example:

@TransactionalObject
class Customer{
   	private final String id;

	public Customer(String id){
		return id;
	}

	public String getId(){
		return name;
	}
}

@TransactionalObject
class Account{
	private final Customer customer;
	private int balance;

	public Account(Customer customer){
		this.customer = customer;
	}

	public Customer getCustomer(){
		return customer;
	}

	public int getBalance(){
		return balance;
	}

	public void setBalance(int newBalance){
		if(newBalance<0){
			throw new NotEnoughMoneyException();
		}	

		this.balance = newBalance;
	}
}

@TransactionalObject
class Bank{
	private final TransactionalMap customers = 
		new TransactionalHashMap();
	private final TransactionalMap accounts = 
		new TransactionalHashMap();

	public Account find(String id){
		Account account = accounts.get(id);
		if(account == null){
			throw new AccountNotFoundException();
		}
		return account;
	}

	public void transfer(String fromId, String toId, int amount){
		if(amount<0){
			throw new IllegalArgumentException();
		}

		Account from = find(fromId);
		Account to = find(fromId);
		to.setBalance(to.getBalance()+amount);
		from.setBalance(from.getBalance()-amount);
	}

	public Customer createCustomer(){
		Customer customer = new Customer();
		customers.put(customer.getId(),customer);
		return customer;
	}

	public Accoount createAccount(Customer customer){
		Account found = accounts.get(customer.getId());
		if(found!=null){
			throw new AccountAlreadyExists();
		}
		Account newAccount = new Account(customer);
		accounts.put(customer.getId(), newAccount);
		return newAccount;
	}
}

And you could wire it up in Spring like this:

<bean id="bank" class="DurableTransactionalObjectFactoryBean">
         <property name="id" value="bank">
         <property name="class" value="Bank">
</bean>

And use this bean in your controllers for example. The DurableTransactionalObjectFactory will create or load a durable transactional object with the given id from disk.

As you can see there is:

  1. no or-mapping needed
  2. no dao’s needed
  3. no fuss with transaction configuration
  4. no need for rolling back state on objects manually, check the tranfer function where the setBalance on the from could thrown a NotEnoughMoneyException and normally could leave the to account in an illegal state

All operations on the bank are executed in full ACID nature.

I hope I’m able to add a experimental storage engine to Multiverse 0.6 (planned in 3 months) so I can learn more about what I’m going to need and what the exact semantics need to be for durable/non-durable objects and about their lifecycles.


Multiverse STM as Java Database

March 2, 2010

Last week I have released Multiverse 0.4 with a lot of new goodies and a completely new website. In the 0.5 also a lot of new goodies will be added (compile-time instrumentation, more transactional datastructures, performance improvements). But I’m also thinking about longer term goals and one of the most interesting ones (I think) is persistence (next to distribution). Since STM already deals with concurrency control, knows about the internals of your objects and is build on transactions, persistence is a logical next step.

The goal is to make it as easy as possible for the Java Developer (perhaps just setting a durability property on true on some annotation), so no need to deal with a relational database, no or-mapping, no schema, no need to deal with caching (should all be done behind the screens). I have seen too many projects wasting lots and lots of resources because:

  1. performance completely sucks (doing a few hundred operations per second with a lot of hardware and expensive software like Oracle RAC). Adding more hardware doesn’t solve the problem and that is where the expensive consultants come in (like I used to do).
  2. a lot of concurrency issues because developers don’t understand the database. So you get your traditional race problems and deadlocks, but you also get a lot of application complexity caused by additional ‘fixing’ logic. Things get even more complicated if a different database is used in production than for development, meaning that you need to deal with problems twice.
  3. needing to deal with the mismatch between the relational and object oriented datamodel
  4. needing to deal with limitations caused by traditional databases. One of the limitations that causes a lot of complexity is the lack of blocking, so some kind of polling mechanism usually is added (Quartz for example) that triggers the logic. This makes very simple things (like a thread executing work) much more cumbersome than it needs to be.
  5. needing to deal with imperfect solutions like proxy based Transactions (Spring for example). The big disadvantage of this approach is that the design of classes gets much more complicated when self calls are done. Another similar problem is see is that DDD (so objects with logics and therefore needing dependencies) is nice on paper, but sucks in practice because dependencies can’t be injected into entities. I know that Spring has some support for it, but often it can’t or is not used. The whole dependency issue in entities (and other objects that are not created inside a container) could have been solved very easily by Hibernate by exposing a factory. So that is something I am going to do right.

I wish Multiverse was available all the times I have seen these issues at customers because it would have made programming more fun and saved lots of money; try to imagine what it costs when a team of developers struggles for days/weeks with these issues or for even larger periods systematically wasting time because the road they have taken is just too complex. I’m not claiming that it will be a fit for every project, but if you just need an easy mechanism to persist Java state and deal with concurrency, Multiverse could be a good solution.


Am I too stupid for @Autowired?

December 2, 2009

When I started with Spring, I finally had the feeling that I could write enterprise application in the same powerful way as I could write normal systems. And I have written quite complex Java stuff like expertsystems, prolog compilers and various other compilers and interpreters and currently working on Multiverse; a software transactional memory implementation. I also have written quite a lot of traditional enterprisy Spring applications (frontend, backend/batch processing). So I consider myself at least as smart as most developers.

The first applicationcontext files felt a littlebit strange, finding the balance between the objects that need to be created inside the application context and objects created in the Java objects themselves. But it didn’t took me long to find this balance and realise how powerful the application context is:

  1. it acts like a executable piece of software documentation where I can see how a system works, just by looking at a few xml files. I don’t need to look in the source to see how stuff is wired.
    I compare it with looking at Lego instructions, by looking at the instructions I can see what is going to be build (not that I ever followed the design). To understand what is build, I don’t need to look at the individual parts.
  2. separation of interfaces and implementation, so testing and using a lot of cool oo design stuff is a dream
  3. having the freedom to inject different instances of the same interface on different locations. This makes it possible to do cool stuff like adding proxies, logging etc etc. I wasn’t forced to jump through hoops any more because of all kinds of container specific constraints

With the old versions of Spring (1.2 series) I had the feeling of total control and complete understanding.

The application context made me aware that I needed to work with 2 hats:

  1. ‘Object’ designer; so actually writing the Java code and documentation
  2. ‘Object’ integrator; where I assembly complete systems based on the objects

And I loved it.. I have more than 1.000 post on the Spring forum just because I believe in this approach.

But if I look at a lot of modern Spring applications, filled with @Component and @Autowired annotations, it feels like I have lost it all. It takes me a lot longer to realise how something works, even though I have a great ide (newest beta of IntelliJ) with perfect Spring integration that makes finding dependencies a lot easier. So I keep on jumping from file to file to understand the big picture. A lot of developers I meet/work with think that the auto-wiring functionality is great because it saves them a lot of time and prevents them from programming in XML (XML was a bad choice, but that is another discussion), but somehow my brain just doesn’t get it.

So my big questions are:

  1. am I too stupid?
  2. if I’m going to try it longer, am I going to see the light?
  3. do I expect too much? So because less characters need to be typed, is it inevitable that some redundant/explicit information is lost
  4. is this some hype and eventually people start to realise that it was a nice experiment, but doesn’t provide the value it promises (just like checked exceptions or EJB 2). And in 10 years we can all laugh about it.

2 Types of Dao’s.

November 2, 2009

From time to time I hear the discussion about how easy it is to replace dao implementations. But this isn’t as easy as it sounds because there are 2 types of dao’s:

  1. session-less dao’s; e.g. an JDBC, IBatis or Gigaspaces implementation
  2. session-based dao; e.g. a Hibernate/JPA implementation

One big difference between these 2 is that with a session-based dao, it doesn’t matter if an object is changed after a save/update has been called because the object is attached to the session. When the transaction commits it will commit the last state of the object because it can access these objects from the session. With a session-less dao’s, only the state of the object when the save/update was called will be persisted. Personally I prefer to name the save/update methods of a session-based dao ‘attach’ because it makes clear what is really going on. And loaded objects are attached by default, so no need to call save/update on them.

Another big difference between these 2 is that with a session-based dao, you will get the same object instance within a transaction for some entity. So you don’t need to worry about accessing it more than once within a transaction. With session-less dao’s, you can get a new instance within a single transaction, every time you do a load. So you need to take care to update the correct instance.

So even though dao’s look easy to replace because they have an interface, there is a lot of ‘undocumented’ behaviour that makes it a lot trickier than expected.


Cutting costs: Layering good, tiers bad

March 17, 2009

From time to time I see old-school software architectures at customers where all the layers of the applications (the most common ones are ui, business logic, persistence) are placed in separate tiers (different machines). There are a lot of problems with this approach:

  1. increased latency because of all the marshalling, marshalling and network communication.
  2. increased development time because of all the marshalling/unmarshalling code (so DTO’s, remote and local interfaces etc).
  3. increased hardware costs: often you see as much front end machines, as backend machines (that don’t do much essential stuff, but spend a lot of time waiting and marshalling/unmarshalling). So instead of 2×2 machines for example, you have more uptime with 3×1 machines and lower hardware costs.
  4. increased license costs since you need to run extra servers
  5. increased maintenance costs since you need to maintain more machines
  6. increased infrastructure costs since there are more machines to connect
  7. increased troubleshooting costs: in these environments it is much harder to figure out what is going wrong because there is so much communication going on.

So instead of hosting the layers on different machines, it often imho is better to host those layers on the same hardware or even on the same JVM. It doesn’t mean this is the only approach, but I have seen too many applications where this useless separation of tiers has lead to a lot of extra cost. So it certainly is something worth considering if you need to cut costs in these financial difficult times.


HashMap is not a thread-safe structure.

May 26, 2008

Last few months I have seen too much code where a HashMap (without any extra synchronization) is used instead of a thread-safe alternative like the ConcurrentHashMap or the less concurrent but still thread-safe HashTable. This is an example of a HashMap used in a home grown cache (used in a multi-threaded environment):

interface ValueProvider{
	V retrieve(K key);
}

public class SomeCache{

	private Map map = new HashMap();
	private ValueProvider valueProvider;

	public SomeCache(ValueProvider valueProvider){
		this.valueProvider = valueProvider;
	}

	public V getValue(K key){
		V value = map.get(key);
		if(value == null){
			value = valueProvider.get(key);
			if(value!=null)
				map.put(key,value);		
		}	
		return value;
	} 
} 

There is much wrong with this innocent looking piece of code. There is no happens before relation between the put of the value in the map, and the get of the value. This means that a thread that receives the value from the cache, doesn’t need to see all fields if the value has publication problems (most non thread-safe structures have publication problems). The same goes for the value and the internals (the buckets for example) of the HashMap. This means that updates to the internals of the HashMap while putting, don’t need to be visible to a thread that does the get.
So it could be that the state of the cache in main memory is not in an allowed state (some of the changes maybe are stuck in the cpu-cache), and the cache could start behaving erroneous and if you are lucky, it starts throwing exceptions. And last, but certainly not least, there also is a classic race problem: if 2 threads do a interleaved map.put, the internals of the HashMap can get in an inconsistent state. In most cases an application reboot/redploy would be the only way to fix this problem.

There are other problems with the cache behavior of this code as well. The items don’t have a timeout, so once a value gets in the cache, it stays in the cache. In practice this could lead to a webpage that keeps displaying some value, even though in the main repository the value has been updated. An application reboot/redeploy also is the only way to solve this problem. Using a Common Of The Shelf (COTS) cache would be a much saver solution, even though a new library needs to be added.

It is important to realize that a HashMap can be used perfectly in a multi-threaded environment if extra synchronization is added. But without extra synchronization, it is a time-bomb waiting to go off.


Executing long running task from UI

December 15, 2007

A colleague asked me how one could prevent the execution a long running task on some UI thread (e.g. the thread of a Servlet container or the Swing event dispatching thread) and also how one could prevent the concurrent execution of that same task. So I send him an email containing a small example and decided to place it on my blog to help others struggling with the same issue.

The long running task

The FooService is the service with the long running method ‘foo’.

interface FooService{
        void foo();
}

There is no need to add threading logic to the FooService implementation. It can focus purely on the realization of the business process by implementing the business logic. The code should not be infected with concurrency control logic, because it makes testing very hard, and makes code also hard to understand, reuse or change. So this is one of the first potential refactoring I often see in code. I’ll post more about this in ‘Java Concurrency Top 10’.

The execution-service

The FooExecutionService is responsible for executing the FooService and preventing concurrent execution (if the correctly configured executor instance is injected). Personally I prefer to inject the executor instead creating one inside the FooExecutionService, because it makes it hard to test and change/configure.

class FooExecutionService{

        private final FooService fooService;
        private final Executor executor;

	public FooExecutorService(FooService fooService, Executor executor){
		this.fooService = fooService;
		this.executor = executor;
	}

        /**
	 * Starts executing the FooService. This call is asynchronous, so
         * won't block.
	 *
	 * @throws RejectedExecutionException if the execution 
	 *         of foo is not accepted (concurrent/shutdown).
	 */
	public void start(){
                executor.execute(new Task());
        }

        class Task implements Runnable{
                void run(){
                        try{
                                fooService.foo();
                        catch(Exception ex){
                                log.error("failed to ...", ex);
                        }
                }
        }
}

The FooExecutionService could be improved in different ways: it could provide information when a task already is executing. This could be realized by placing a dummy task in the executor and check if the task is rejected. A different solution would be to let the Task publish some information about the status of the current execution. If the task is very long running, and you want to be able to stop the task, you could shutdown the executor by calling the shutdownNow method. This interrupts the worker-threads and if you periodically check the interrupt status of the executing thread while doing to long running call, you can end the execution.

Some Spring configuration

The Executor is injected from the outside by some Spring configuration, i.e.:

<bean id=""fooService" class="FooServiceImpl"/>

<bean id="fooExecutionService" class="FooExecutionService">
        <constructor-arg 	index="0" 
				ref="fooService"/>
        <constructor-arg 	index="1">
            	<bean 	class="java.util.concurrent.ThreadPoolExecutor"
			destroy-method="shutdownNow">
			<!-- minimal poolsize (only 1 thread) -->
                	<constructor-arg 	index="0"
                                 		value="1"/>
                	<!-- maximum poolsize (only 1 thread)-->
                	<constructor-arg 	index="1"
                                 		value="1"/>
                	<!-- the timeout (we don't need it) -->
                	<constructor-arg 	index="2"
                                 		value="0"/>
                	<!-- the timeunit that belongs to the timeout argument (we don't need it) -->
                	<constructor-arg index="3">
                    		<bean 	id="java.util.concurrent.TimeUnit.SECONDS"
                          		class="org.springframework.beans.factory.config.FieldRetrievingFactoryBean"/>
                	</constructor-arg>
                	<!-- the workqueue where unprocessed tasks get stored -->
                	<constructor-arg index="4">
                    		<!-- we don't want any unprocessed work: a worker needs to be available,
                 		or the task gets rejected. -->
                    		<bean class="java.util.concurrent.SynchronousQueue"/>
                	</constructor-arg>
        	</bean>
	</constructor-arg>    
</bean>

If there are multiple long running methods, it would be an idea to extract the creational logic of the executor to a factory method.

The UI-controller

And the FooExecutionService can be hooked up to some controller like this:

class StartFooController extends SomeController{
        final FooExecutionService fooExecutionService;
      	
	StartFooController(FooExecutionService fooExecutionService){
		this.fooExecutionService = fooExecuctionService;
	} 
	
        String handleRequest(Request request, Response response){
                try{
                        fooExecutionService.start();
                        return "success";
                }catch(RejectedExecutionException ex){
                        return "alreadyrunningorshuttingdownview";
                }
        }
}

Prevention of concurrent execution of different tasks

If you want to prevent concurrent execution of different long running methods, you could create a single execution-service for all methods, and share the same executor between the execution of the different tasks:


Use the real database when testing

November 24, 2007

While unit/integration testing the repositories (dao’s in yesterdays lingo), try to use the same type of database, instead of some lightweight alternative like hsldb. Unless it also used in the production environment of course.

Using a different type of database could lead to not detecting problems like:

  1. differences in sql interpretation, types, precision and formatting.
  2. missing database constraints. Usually this is functionality I prefer to guarantee and like to have placed in handwritten ddl scripts, instead of it being out of control because it is generated by some or-mapping technology. This approach also invites for creating update scripts from day one, instead of it being an afterthought and causing pain when upgrading to the next release.
  3. differences in concurrency control mechanisms. What leads to a pessimistic lock in one database, could lead to an optimistic locking failure in another.

This means that there are differences in behavior between databases and these should not be ignored. Not using the real database could lead to problems that are only observed in the acceptation or production environment. And the longer it takes to encounter a bug, the more time it is going to cost to solve it.

I know that in continuous build environments using the same type of database can be difficult; not only the environment needs to support different databases, it also needs to support multiple versions. One of the ways to solve this problem is to use a virtualized operating system (using e.g. VMWare) per database version. This should prevent the build environment from changing into a maintenance nightmare.


Lightweight Batch Processing I: Intro

November 12, 2007

If you are lucky, your application is a lot more complex than just the standard request/response webapplication. The complexity in these application can typically be found in the business domain or in the presentation logic. Batch processing systems process large volumes of data, and this is always something that makes me happy to be a software developer, because so much interesting stuff is going on; especially concurrency control and transaction management.

This is the first blog about lightweight batch processing and the goal is to share my knowledge, and hopefully gain new insights by your comments. There are batch frameworks (like the newest Spring module: Spring Batch) but frameworks often introduce a lot more functionality (and complexity) than required and they can’t always be used for a wide range of reasons (sometimes technical, sometimes political). This set of blogs is aimed at these scenario’s. The approach I use is to start from some basic example, to point out the problems that can occur (and the conditions), and eventually to refactor the example.

Lets get started: underneath you can see a standard approach to processing a batch of employees.

EmployeeDao employeeDao;

@Transactional
void processAll(){
    List batch = getBatch();
    for(Employee employee: batch)
        process(employee);
}

void process(Employee employee){
    ...logic
}

As you can see, the code is quite simple. There is no need to integrate the scheduling logic in the processing logic. It is much better to hook up a scheduler (like Quartz for example) from the outside (makes code much easier to test, to maintain and to extend). This example works fine for a small number of employees and if the processing of a single employee doesn’t take too much time. But when the number of employees increases, or the time to process a single item increases, this approach won’t scale well and could lead to all kinds of problems. One of the biggest problems (for now) is that the complete batch is executed under a single transaction. Although this transaction provides the ‘all or nothing’ (atomicity) functionality that normally is desired, the length of the transaction can lead to all kinds of problems:

  1. lock contention (and even lock escalation depending on the database) leading to decreased performance and eventually to a complete serialized access to the database. This can be problematic if the batch process is not the only user of the database.
  2. failing transactions caused by running out of undo space, or the database aborting the transaction because it runs too long.
  3. when the transaction fails, all the items need to be reprocessed, even the ones that didn’t gave a problem. If the batch takes a long time to run, this behavior could be highly undesirable.

In the following example the long running transaction has been replaced by multiple smaller transactions: 1 transaction to retrieve the batch and 1 transaction for each employee that needs to be processed:

EmployeeDao employeeDao;

void processAll(){
    List batch = getBatch();
    for(Employee employee: batch)
        process(employee);
}

@Transactional
List getBatch(){
    return employeeDao.findItemsToProcess();
}

@Transactional
void process(Employee employee){
    ...logic
}

As you maybe have noticed, this example is not without problems either. One of the biggest problems is that the complete list of employees needs to be retrieved first. If the number of employees is very large, or when a single employee consumes a lot of resources (memory for example) this can lead to all kinds of problems (apart from running another long running transaction!). One of the possible solutions is to retrieve only the id’s:

EmployeeDao employeeDao;

void processAll(){
    List batch = getBatch();
    for(Long id: batch)
        process(id);
}

@Transactional
List getBatch(){
    return employeeDao.findItemsToProcess();
}

@Transactional
void process(long id){
    Entity e = dao.load(id);
    ...actual processing
}

A big advantage of retrieving a list of id’s instead of a list of Employees, is that the transactional behavior is well defined. Detaching and reattaching objects to sessions introduces a lot more vagueness (especially if the or mapping tool doesn’t detach the objects entirely). There are different approaches possible: you can keep a cursor open and retrieve an employee only when it is needed, but the problem is that you still have a long running transaction. Another approach is that the only employees are retrieved that can be processed in single run, this has to be repeated until no items can be found.

In the next blogpost I’ll deal with multi-threading and locking.


The use of Maven cripples the mind

November 7, 2007

I have used Maven 1.1 and 2 for the last 1.5 years, but it always am boggled by the utter complexity of the tool. There are a lot of good aspects about Maven (dependency management, standard tasks etc) but these advantages don’t weigh up to all the problems. For example: If I want to create a war and an ear, I need to create multiple projects. How stupid can you be?

With ANT (used it for more than 5 years) I would add an extra target and I’m done. I can do whatever I want. That is why I add all functionality to my script so I get a one button release. So even though some stuff in Maven is much easier to set up, in the end I’m spending much more time configuring maven than it would take in a different tool like ANT. Other problems of maven are badly documented tasks, tasks that break for unclear reasons (ANT is as solid as a rock). My bet (and hope) is that Maven in a few years is seen as something bad like EJB 2 and replaced by modern build environments like GANT, Rake, Raven or Build’r. Build languages that are just internal DSL’s; so you can switch to the full blown language without needing to create plugins and it also prevents programming in XML (XML was a bad choice for ANT from day one).