Oracle and ORA-08177

November 28, 2007

I posted a blog on the company blog about Oracle and the ORA-08177. So if want to know more about the subject I suggest checking it out.

Oracle and ORA-08177


Use the real database when testing

November 24, 2007

While unit/integration testing the repositories (dao’s in yesterdays lingo), try to use the same type of database, instead of some lightweight alternative like hsldb. Unless it also used in the production environment of course.

Using a different type of database could lead to not detecting problems like:

  1. differences in sql interpretation, types, precision and formatting.
  2. missing database constraints. Usually this is functionality I prefer to guarantee and like to have placed in handwritten ddl scripts, instead of it being out of control because it is generated by some or-mapping technology. This approach also invites for creating update scripts from day one, instead of it being an afterthought and causing pain when upgrading to the next release.
  3. differences in concurrency control mechanisms. What leads to a pessimistic lock in one database, could lead to an optimistic locking failure in another.

This means that there are differences in behavior between databases and these should not be ignored. Not using the real database could lead to problems that are only observed in the acceptation or production environment. And the longer it takes to encounter a bug, the more time it is going to cost to solve it.

I know that in continuous build environments using the same type of database can be difficult; not only the environment needs to support different databases, it also needs to support multiple versions. One of the ways to solve this problem is to use a virtualized operating system (using e.g. VMWare) per database version. This should prevent the build environment from changing into a maintenance nightmare.


Minimal Agile documentation

November 13, 2007

One of the things that wonder me when looking at source code of Agile projects, is the lack of documentation. The minimal amount of documentation on classes/interfaces is the description of its core responsibility: what is the purpose of this class? It doesn’t need be be very long; a few sentences would be enough. And if the class/interface is part of some design pattern, this should be added to the documentation as well. If the the class is completely self contained, and the pattern is well known (like a Strategy or Decorator), I could live without the documentation but not everyone is a pattern freak, I would rather see it in the documentation.

Another important document is some sort of architectural overview to get a quick understanding of the system without needing to browse through all sources. This document should at least cover the main components of the system, what the responsibilities are of these components and how they interact and why this approach is being used. Without this information it is very hard to say anything about performance, security, scalability, re-usability.

I know that not writing documentation shouldn’t be blamed on the Agile methodology, but on its misinterpretation.

ps:
If you want to remove redundant parts in the documentation, why is the @author tag not removed from the javadoc? If I want to figure out who wrote it (or changed it) I could look at the blame functionality most version control systems provide.


Lightweight Batch Processing I: Intro

November 12, 2007

If you are lucky, your application is a lot more complex than just the standard request/response webapplication. The complexity in these application can typically be found in the business domain or in the presentation logic. Batch processing systems process large volumes of data, and this is always something that makes me happy to be a software developer, because so much interesting stuff is going on; especially concurrency control and transaction management.

This is the first blog about lightweight batch processing and the goal is to share my knowledge, and hopefully gain new insights by your comments. There are batch frameworks (like the newest Spring module: Spring Batch) but frameworks often introduce a lot more functionality (and complexity) than required and they can’t always be used for a wide range of reasons (sometimes technical, sometimes political). This set of blogs is aimed at these scenario’s. The approach I use is to start from some basic example, to point out the problems that can occur (and the conditions), and eventually to refactor the example.

Lets get started: underneath you can see a standard approach to processing a batch of employees.

EmployeeDao employeeDao;

@Transactional
void processAll(){
    List batch = getBatch();
    for(Employee employee: batch)
        process(employee);
}

void process(Employee employee){
    ...logic
}

As you can see, the code is quite simple. There is no need to integrate the scheduling logic in the processing logic. It is much better to hook up a scheduler (like Quartz for example) from the outside (makes code much easier to test, to maintain and to extend). This example works fine for a small number of employees and if the processing of a single employee doesn’t take too much time. But when the number of employees increases, or the time to process a single item increases, this approach won’t scale well and could lead to all kinds of problems. One of the biggest problems (for now) is that the complete batch is executed under a single transaction. Although this transaction provides the ‘all or nothing’ (atomicity) functionality that normally is desired, the length of the transaction can lead to all kinds of problems:

  1. lock contention (and even lock escalation depending on the database) leading to decreased performance and eventually to a complete serialized access to the database. This can be problematic if the batch process is not the only user of the database.
  2. failing transactions caused by running out of undo space, or the database aborting the transaction because it runs too long.
  3. when the transaction fails, all the items need to be reprocessed, even the ones that didn’t gave a problem. If the batch takes a long time to run, this behavior could be highly undesirable.

In the following example the long running transaction has been replaced by multiple smaller transactions: 1 transaction to retrieve the batch and 1 transaction for each employee that needs to be processed:

EmployeeDao employeeDao;

void processAll(){
    List batch = getBatch();
    for(Employee employee: batch)
        process(employee);
}

@Transactional
List getBatch(){
    return employeeDao.findItemsToProcess();
}

@Transactional
void process(Employee employee){
    ...logic
}

As you maybe have noticed, this example is not without problems either. One of the biggest problems is that the complete list of employees needs to be retrieved first. If the number of employees is very large, or when a single employee consumes a lot of resources (memory for example) this can lead to all kinds of problems (apart from running another long running transaction!). One of the possible solutions is to retrieve only the id’s:

EmployeeDao employeeDao;

void processAll(){
    List batch = getBatch();
    for(Long id: batch)
        process(id);
}

@Transactional
List getBatch(){
    return employeeDao.findItemsToProcess();
}

@Transactional
void process(long id){
    Entity e = dao.load(id);
    ...actual processing
}

A big advantage of retrieving a list of id’s instead of a list of Employees, is that the transactional behavior is well defined. Detaching and reattaching objects to sessions introduces a lot more vagueness (especially if the or mapping tool doesn’t detach the objects entirely). There are different approaches possible: you can keep a cursor open and retrieve an employee only when it is needed, but the problem is that you still have a long running transaction. Another approach is that the only employees are retrieved that can be processed in single run, this has to be repeated until no items can be found.

In the next blogpost I’ll deal with multi-threading and locking.


The use of Maven cripples the mind

November 7, 2007

I have used Maven 1.1 and 2 for the last 1.5 years, but it always am boggled by the utter complexity of the tool. There are a lot of good aspects about Maven (dependency management, standard tasks etc) but these advantages don’t weigh up to all the problems. For example: If I want to create a war and an ear, I need to create multiple projects. How stupid can you be?

With ANT (used it for more than 5 years) I would add an extra target and I’m done. I can do whatever I want. That is why I add all functionality to my script so I get a one button release. So even though some stuff in Maven is much easier to set up, in the end I’m spending much more time configuring maven than it would take in a different tool like ANT. Other problems of maven are badly documented tasks, tasks that break for unclear reasons (ANT is as solid as a rock). My bet (and hope) is that Maven in a few years is seen as something bad like EJB 2 and replaced by modern build environments like GANT, Rake, Raven or Build’r. Build languages that are just internal DSL’s; so you can switch to the full blown language without needing to create plugins and it also prevents programming in XML (XML was a bad choice for ANT from day one).