DSL’s: a higher abstraction to the rescue

June 29, 2007

One of the things that bother me is that it is very easy to loose overview in large wired structures and I think DSL’s can limit the distance between specification and implementation. Example of something set up with a very low abstraction in Spring:

    <bean id="fileWritingProcess"
          class="com.xebia.clustering.processes.FileWritingProcess"/>

    <bean id="resequenceProcess"
          class="org.codehaus.prometheus.processors.ResequenceProcess"/>

    <bean id="fileWritingProcessor"
          class="org.codehaus.prometheus.processors.standardprocessor.StandardProcessor">
        <constructor-arg index="0"
                         type="org.codehaus.prometheus.channels.InputChannel"
                         ref="fileWritingProcessor-input"/>
        <constructor-arg index="1">
            <list>
                <ref bean="resequenceProcess"/>
                <ref bean="fileWritingProcess"/>
            </list>
        </constructor-arg>
    </bean>

    <bean id="fileWritingProcessorRepeater"
          class="org.codehaus.prometheus.repeater.ThreadPoolRepeater"
          init-method="start"
          destroy-method="shutdown">
        <constructor-arg index="0">
            <bean class="org.codehaus.prometheus.processors.ProcessorRepeatable">
                <constructor-arg ref="fileWritingProcessor"/>
            </bean>
        </constructor-arg>

        <property name="exceptionHandler">
            <bean class="org.codehaus.prometheus.exceptionhandler.Log4JExceptionHandler"/>
        </property>

        <property name="shutdownAfterFalse"
                  value="true"/>
    </bean>

And this is only 1/4 of the XML configuration.

Or now written in a DSL in Groovy (I’m still playing with the concepts and Groovy, so it could be that the syntax isn’t correct).

def piped = environment{
    queues{
        queue(name:'queue1'),
        queue(name:'queue2'),
        queue(name:'queue3')
    }

    pipeline{
        [parserprocess,
         queue1,
         parallel(threadcount:10){fibonacciprocess},
         queue2,
         parallel(threadcount:10){piprocess},
         queue3,
         [resequenceprocess,filewritingprocess]]
    }
}

I think most can imagine what the last script is doing without knowing anything of the application. That is the power of a DSL.


Prometheus and message passing

June 29, 2007

I’m currently working on lightweight message passing functionality in Prometheus: a concurrency library I’m working on. The goal of this functionality is to make high-volume data or batch processing easier by:

  1. using standard POJO’s as processes (the functionality that acts on a incoming message by transforming a message or modifying external state). So it is easy to use within IOC containers like Spring.
  2. externalize concurrency control on messages: this plumbing can totally obfuscate the core logic that is being executed. It also limits freedom of wiring up a process up in a different manner.
  3. control the processor (the environment in which a process is executed) by throttling the execution speed or the number of concurrent threads for example
  4. pattern matching on message types
  5. handling exceptions through policies: a difficult topic with pipe lined solutions (see also ‘Piped and Filters’ in Patterns of Software Architecture). A few examples of out of the box policies are ‘ignore problem’, ‘drop message’, ‘create poison message’ (see Enterprise Integration Patterns).

If concurrency control is removed from the process, a different structure must take over this responsibility and in Prometheus that is the task of the processor. So apart from its own internal synchronization it also need to deal with preventing isolation problems on messages.

One of the most complex aspects of concurrency control is synchronization: preventing that multiple threads are interfering with each other because they are not proper isolated. Luckily there are a few shortcuts that can be used to prevent synchronization complexity:

  1. make objects immutable. Proper created immutable objects are thread safe because their state can’t change. So threads are not able to interfere with each other (they can’t communicate to each other)
  2. isolate (confine) objects. If you can guarantee that at most a single thread can access an object for the duration of some task, threads are not able to interfere with each other because each thread has exclusive access to that object while executing that task.

Immutable objects are not always the most convenient message because they can’t maintain state between each processing of that message. Especially in a pipelined solution (a chain of message processors), you want to be able to modify that message (add calculated content for example). That is why mutable message can be convenient. The problem with mutable messages is that you need to deal with synchronization or isolation. Message passing solutions (at least in Prometheus) are connected by queues and Prometheus takes care of passing message from queue through the process and putting it in the next queue. You also get the guarantee (as long as processes don’t maintain references) that a message can only be passed by one process at any moment.

This means we don’t need to deal with the complexity of synchronization and this is why lightweight message passing solutions is one of the ways to make effective use of multi-cores in a imperative language like Java (pure functional language are a lot easier to parallelize).

example of a process in Prometheus:

public class FileWritingProcess {

    private Writer writer;

    public void receive(Task task) throws IOException {
        writer.write(task.toString() + "n");
        writer.flush();
    }

    public void receive(StartOfStreamEvent e) throws IOException {
        writer = new FileWriter(e.getOutputFile());
    }

    public void receive(EndOfStreamEvent e) throws IOException {
        writer.close();
    }
}

The joy of open source software development

May 22, 2007

At the moment I’m almost finished with the first release of a concurrency library I’m working on: Prometheus. The library is released under the MIT license and I hope to make a release very soon. The code almost is finished, but a lot of time goes in the details (one button release, site, documentation, setting it up with Bamboo etc). The cool thing is that I’m able to play with all kinds of very cool technologies that were granted because Prometheus is open source.

  1. JIRA: great issue and feature tracking system. I have used others (Rational ClearQuest, Trac) but JIRA is a still my preferred choice.
  2. Confluence: great wiki environment
  3. JProfiler: great profiler. I haven’t used it on the Prometheus library (yet)
  4. Yourkit: also a profiler I heard great stories about. I’m certainly going to give it a try.
  5. Clover: a test coverage tool. It really is amazing, it have found some untested code and unused testcode after checking the Clover report. Another cool thing is that it is safe to use in a multi threaded environment (this is quite handy for a multi threaded library like Prometheus). I also was granted a license for Clover 2 beta: it looks great and provides a lot more information. So I’m certainly going to give it a try.
  6. IntelliJ IDEA: what is there to say, it is my preferred editor. Would you spend $500 for Eclipse? But personally I don’t care much what other developers are using: if you can program as fast in vi as I in IDEA, I have no problem with vi.
  7. Bamboo for continuous integration. I normally use Cruise Control, but it appears that Bamboo is taking it to the next level. I’m going to integrate Prometheus on Bamboo as soon as I have time. There are some issues I have to work out, for example: repeated execution of tests to increase the chance of finding concurrency problems.
  8. Structure101: a tool for structure analysis. I have asked the guys from Headway Software if they have plans for an open source license, and luckily they do. So I’m going to play with it on Prometheus. If I’m able to extract relevant information from it, I plan to use it within Xebia the company I work for to assist with quality assurance.
  9. Fisheye: great for extracting information from Subversion. I haven’t played with it on Prometheus much, but I can imagine it is very useful if there is a large code base.

It almost feels like Christmas 🙂

There are also a bunch of open source products I use that can’t be left unmentioned:

  1. JUnit: what is there to say?
  2. EasyMock: easy when I just need to mock something for testing purposes. For the Prometheus library I decided not to use a mocking approach in most cases because I want to know if a component is functioning correctly and not if I have recorded correctly. For most standard objects mocking is good enough, but for something as complex as concurrency control, I didn’t want to trust on my recording abilities.
  3. ANT: After 2 years I still don’t care much about Maven 1 and 2. I think they are straitjackets and as long as you can obey the Maven rules, you are a happy guy. But as soon as you need something special, you are on your own. So ANT still is my preferred choice even though it takes more time to set up a script.
  4. Groovy & Gant. Ant provides a great infrastructure but it was a very bad choice to use XML as syntax (it always was a bad choice, but XML isn’t as hot as it used to be and now you are allowed say that it should not be used for anything else than data exchange. I have switched from ANT to Groovy ANT (Gant): the convenience of a DSL, but the power of a full blown language. I also use Groovy to generate the Prometheus site. It is great to have a scripting environment that integrates perfectly with Java.
  5. FindBugs: it is great for finding bugs. I have run it on Prometheus and immediately it discovered some bugs (not calling the super in TestCase.setUp in quite a few cases for example). Checkstyle also is something I’m familiar with but spending more time to make it shut up than I get value from it.
  6. Subversion: next to CVS it is ‘the’ Code Version Control system.

And last but certainly not least: I have to thank Codehaus. A lot of licenses were provided by Codehaus and I’m also happy I can make use of their servers (although I have backed up my code not only on their machines 😉 )