DSL’s: a higher abstraction to the rescue

June 29, 2007

One of the things that bother me is that it is very easy to loose overview in large wired structures and I think DSL’s can limit the distance between specification and implementation. Example of something set up with a very low abstraction in Spring:

    <bean id="fileWritingProcess"
          class="com.xebia.clustering.processes.FileWritingProcess"/>

    <bean id="resequenceProcess"
          class="org.codehaus.prometheus.processors.ResequenceProcess"/>

    <bean id="fileWritingProcessor"
          class="org.codehaus.prometheus.processors.standardprocessor.StandardProcessor">
        <constructor-arg index="0"
                         type="org.codehaus.prometheus.channels.InputChannel"
                         ref="fileWritingProcessor-input"/>
        <constructor-arg index="1">
            <list>
                <ref bean="resequenceProcess"/>
                <ref bean="fileWritingProcess"/>
            </list>
        </constructor-arg>
    </bean>

    <bean id="fileWritingProcessorRepeater"
          class="org.codehaus.prometheus.repeater.ThreadPoolRepeater"
          init-method="start"
          destroy-method="shutdown">
        <constructor-arg index="0">
            <bean class="org.codehaus.prometheus.processors.ProcessorRepeatable">
                <constructor-arg ref="fileWritingProcessor"/>
            </bean>
        </constructor-arg>

        <property name="exceptionHandler">
            <bean class="org.codehaus.prometheus.exceptionhandler.Log4JExceptionHandler"/>
        </property>

        <property name="shutdownAfterFalse"
                  value="true"/>
    </bean>

And this is only 1/4 of the XML configuration.

Or now written in a DSL in Groovy (I’m still playing with the concepts and Groovy, so it could be that the syntax isn’t correct).

def piped = environment{
    queues{
        queue(name:'queue1'),
        queue(name:'queue2'),
        queue(name:'queue3')
    }

    pipeline{
        [parserprocess,
         queue1,
         parallel(threadcount:10){fibonacciprocess},
         queue2,
         parallel(threadcount:10){piprocess},
         queue3,
         [resequenceprocess,filewritingprocess]]
    }
}

I think most can imagine what the last script is doing without knowing anything of the application. That is the power of a DSL.


Prometheus and message passing

June 29, 2007

I’m currently working on lightweight message passing functionality in Prometheus: a concurrency library I’m working on. The goal of this functionality is to make high-volume data or batch processing easier by:

  1. using standard POJO’s as processes (the functionality that acts on a incoming message by transforming a message or modifying external state). So it is easy to use within IOC containers like Spring.
  2. externalize concurrency control on messages: this plumbing can totally obfuscate the core logic that is being executed. It also limits freedom of wiring up a process up in a different manner.
  3. control the processor (the environment in which a process is executed) by throttling the execution speed or the number of concurrent threads for example
  4. pattern matching on message types
  5. handling exceptions through policies: a difficult topic with pipe lined solutions (see also ‘Piped and Filters’ in Patterns of Software Architecture). A few examples of out of the box policies are ‘ignore problem’, ‘drop message’, ‘create poison message’ (see Enterprise Integration Patterns).

If concurrency control is removed from the process, a different structure must take over this responsibility and in Prometheus that is the task of the processor. So apart from its own internal synchronization it also need to deal with preventing isolation problems on messages.

One of the most complex aspects of concurrency control is synchronization: preventing that multiple threads are interfering with each other because they are not proper isolated. Luckily there are a few shortcuts that can be used to prevent synchronization complexity:

  1. make objects immutable. Proper created immutable objects are thread safe because their state can’t change. So threads are not able to interfere with each other (they can’t communicate to each other)
  2. isolate (confine) objects. If you can guarantee that at most a single thread can access an object for the duration of some task, threads are not able to interfere with each other because each thread has exclusive access to that object while executing that task.

Immutable objects are not always the most convenient message because they can’t maintain state between each processing of that message. Especially in a pipelined solution (a chain of message processors), you want to be able to modify that message (add calculated content for example). That is why mutable message can be convenient. The problem with mutable messages is that you need to deal with synchronization or isolation. Message passing solutions (at least in Prometheus) are connected by queues and Prometheus takes care of passing message from queue through the process and putting it in the next queue. You also get the guarantee (as long as processes don’t maintain references) that a message can only be passed by one process at any moment.

This means we don’t need to deal with the complexity of synchronization and this is why lightweight message passing solutions is one of the ways to make effective use of multi-cores in a imperative language like Java (pure functional language are a lot easier to parallelize).

example of a process in Prometheus:

public class FileWritingProcess {

    private Writer writer;

    public void receive(Task task) throws IOException {
        writer.write(task.toString() + "n");
        writer.flush();
    }

    public void receive(StartOfStreamEvent e) throws IOException {
        writer = new FileWriter(e.getOutputFile());
    }

    public void receive(EndOfStreamEvent e) throws IOException {
        writer.close();
    }
}

RIA and scalability

June 21, 2007

At the moment I’m writing a blog for the company website about the new process functionality in Prometheus, and something occurred to me.

With the introduction of multi-cores, parallelism is going to be more important. One of the easiest types of applications to scale is one with small and independent tasks, for example a standard web-application. Requests can be executed concurrent, so increasing the number of cores, is going to increase the throughput (not the latency). When latency also needs to be improved things get much more complicated because you need to introduce parallelism on a much lower level. Web pages are getting more and more complex, so building a single screen is going to take more and more time if no extra parallelism is introduced.

The cool thing is that with the introduction of Rich Internet Applications (RIA), the type of request is going to change. Instead of having one big ‘build me the complete page’ request, you will have a lot of smaller and independent requests ‘please get me the newest value for this page component’. This means that RIA applications could scale a lot easier. I have no numbers to back it up, so don’t shoot me if I’m wrong.


Flexible quality

June 13, 2007

One of the things I have observed is that some projects get into problems caused by quality issues. The following 3 properties are part of every project:

  1. functionality: the stuff that needs to be implemented
  2. time: the total amount in man hours (budget is also expressed in time)
  3. quality: structural/code quality, maintainability, number of bugs

If functionality and time are fixed, the only thing that is flexible is quality. So especially in the end of an iteration, when the pressure is felt, the quality goes down: code doesn`t get refactored and the structure degrades, test are not complete or totally missing. The problem gets even worse because the next iteration is going to contain debt from the previous one. If this debt isn`t added as in issue for the next iteration to solve, you will get a lot of hidden work, and hidden work is a recipe for project failure.

The Agile methodology promotes fixed time and fixed quality, so the only thing that is flexible is functionality. I’m a very pragmatic guy, so flexible time (so increasing the length of an iteration) would be negotiable (although you could loose your rhythm) but when quality is negotiable I think you are on the road to project hell: unhappy customers, developers and managers.