Multiverse STM as Java Database

Last week I have released Multiverse 0.4 with a lot of new goodies and a completely new website. In the 0.5 also a lot of new goodies will be added (compile-time instrumentation, more transactional datastructures, performance improvements). But I’m also thinking about longer term goals and one of the most interesting ones (I think) is persistence (next to distribution). Since STM already deals with concurrency control, knows about the internals of your objects and is build on transactions, persistence is a logical next step.

The goal is to make it as easy as possible for the Java Developer (perhaps just setting a durability property on true on some annotation), so no need to deal with a relational database, no or-mapping, no schema, no need to deal with caching (should all be done behind the screens). I have seen too many projects wasting lots and lots of resources because:

  1. performance completely sucks (doing a few hundred operations per second with a lot of hardware and expensive software like Oracle RAC). Adding more hardware doesn’t solve the problem and that is where the expensive consultants come in (like I used to do).
  2. a lot of concurrency issues because developers don’t understand the database. So you get your traditional race problems and deadlocks, but you also get a lot of application complexity caused by additional ‘fixing’ logic. Things get even more complicated if a different database is used in production than for development, meaning that you need to deal with problems twice.
  3. needing to deal with the mismatch between the relational and object oriented datamodel
  4. needing to deal with limitations caused by traditional databases. One of the limitations that causes a lot of complexity is the lack of blocking, so some kind of polling mechanism usually is added (Quartz for example) that triggers the logic. This makes very simple things (like a thread executing work) much more cumbersome than it needs to be.
  5. needing to deal with imperfect solutions like proxy based Transactions (Spring for example). The big disadvantage of this approach is that the design of classes gets much more complicated when self calls are done. Another similar problem is see is that DDD (so objects with logics and therefore needing dependencies) is nice on paper, but sucks in practice because dependencies can’t be injected into entities. I know that Spring has some support for it, but often it can’t or is not used. The whole dependency issue in entities (and other objects that are not created inside a container) could have been solved very easily by Hibernate by exposing a factory. So that is something I am going to do right.

I wish Multiverse was available all the times I have seen these issues at customers because it would have made programming more fun and saved lots of money; try to imagine what it costs when a team of developers struggles for days/weeks with these issues or for even larger periods systematically wasting time because the road they have taken is just too complex. I’m not claiming that it will be a fit for every project, but if you just need an easy mechanism to persist Java state and deal with concurrency, Multiverse could be a good solution.

Advertisements

6 Responses to Multiverse STM as Java Database

  1. Dustin Whitney says:

    This sounds pretty cool. I have recently been playing a lot with the akka framework, and I’m really impressed with its STM (built on your software, of course!). I completely agree with what you’re saying here.

    How will your solution be different than Terracotta/Ehcache? I have played around with both of them. The newest release made setting up a distributed cache incredibly easy. The problem is that going down the easy route, you only get copies of the of the data you put in cache, so you can’t really do a lot of DDD things one would like (as you mentioned above).

    I’m totally interested in your thoughts on distribution too. Please elaborate! Do you have a twitter account?

    • pveentjer says:

      Hi Dustin,

      I’m already thinking about using terracotta as middleware. Although I’m also thinking about writing a storage solution myself since I have more control on storage, ‘schema’ changes, versioning of data etc.

      I don’t have a twitter account, but I’m placing my brainfarts on linkedin. So perhaps you can send me a request (peter veenter).

  2. Philip Andronov says:

    I think it is better to provide developer with a framework which allows him to create “database”:
    1. Transactional collections with obverver interface
    2. Transactional file I/O

    If you have that you could build any store system you want. There are many complicated things about “databases” which is hard to solve in general, for example:
    1. You have object A which contains object B, how to store it? – schema
    2. Indexies
    3. Caching policy

    So i would prefer to have something like:
    {code}
    TransactionalFileStore store = new TransactionalFileStore(“/var/lib/sample”);
    ListBackend bck = new FileListBackend(store, “objects.lst”);

    ArrayList list = new TransactionalArrayList(new CachedBackend(50, bck));

    TreeBackend bck1 = new FileTreeBackend(store,
    “objects-1.idx”);
    Indexer index1 = new RedBlackTreeIndexer(new CachedBackend(50, bck1), new TIndexBuilder(){…});

    list.addObserver(index1); // if you want bigger performance, just add task to some queue and process it cuncurrently
    index.lookupExact(“some value”);
    index.lookupMoreOrEquals(“some other value”);
    {code}

    store – directory to store redo/undo logs, and all other stuff
    ListBackend – simple interface with store/get methods
    FileListBackend – implements ListBackend and stores all to the file
    CachedBackend – takes and Backend and build cache around it
    Index – somple Index interface with number of lookup methods. It attaches to any collection and get notifications of every update…

    ——–
    PS: About distributed transactions – i think it would be nice to add E3PC protocol, in some cases 2PC does not work properly, unfortunally just in my case πŸ™‚

    Also, i’m really waiting for arrays and transactional collections. I have even written my own implementation for arrays, but not shure is it suitable for project (it creates array copy on every size update and does not track changes inside array cells ;(). But if you need help, let me know about your vision on arrays integration, i will adjust my implementation and commit it back to your project πŸ™‚

    • pveentjer says:

      Hi Philip,

      I’m still in the exploring stages myself so everything is open for change πŸ™‚

      >>1. You have object A which contains object B, how to store it? – schema

      Essentially I know the structure of the object and as long as it doesn’t contains any weird stuff, I can store it. For the weird stuff (so types I don’t know about and are not transactional) I could create some kind of mapping interface.

      >> 2. Indexies

      Very good one. One of the things I’m thinking about is that it all should be done through Java. If you need an Index, you can create a TreeMap and persist that. If you want to look up an object, use the map which transparently is rematerialized. At the moment I’m mostly focussing on low level file issues like recovery.

      >> 3. Caching policy

      I’m planning to provide a cache by default that should be tunable of course.

      Have you created the array support using the Multiverse project? I’m also playing with it an have the problem partially cracked. What is do is changing the type of the array, for example if I have an array of ints, I upgrade the array to an array to TransactionalInteger object (which in themselves are transactional objects). Filling the array with these transactional wrappers is done lazily to prevent creating everything up front. The problem I haven’t cracked is how to determine in the instrumentation when an array access is done of this was done an a normal array or that it was done on a transactional version.

      About the ec3 stuff:
      One of the big problem at the moment in Multiverse is that a single shared clock is used (AtomicLong). I need to get rid of that clock first. I see 2 solutions atm:
      – simple solution using partitioning. So within a partition a single shared clock is used
      – vector clock. For in memory stuff this is going to be a performance problem I think. But for distributed environment this is less of a concern I think.

      But for the time being I leave the distributed stuff alone. Main focus is:
      – more transactional datastructures
      – improves instrumentation (arrays, more performance, …)
      – compiletime instrumentation
      – commuting operations.

  3. Philip Andronov says:

    First of all thanks a lot for the Multiverse, it is cool πŸ™‚

    Next, let me clarify my points about persistent:
    1. Schema
    For example i have object A that contains object B. The question is – am i want to store B object with A, or i want to store some ID instead and get B reference from some factory during restore process? It is alternative for relations in traditional DB.

    2. Indecies
    Here we are totally agree πŸ™‚

    3. Caching policy
    I prefer to have different caches implementations, like timing caches, sized cache, e.t.c and have a ability to use any of them.

    After all Multiverse + persistent is a some kind of Object Database and it is non trivial task to create something like that.

    From my point of view the ideal STM is:
    1. Lightweight – it is small and fast
    2. Modular – if i dont need something, i dont get it
    It is not a “framework”, like Hibernate, but a small core and some number of blocks with i could play around

    From that perspective, i would like to have 4 main packages:
    1. STM-core, STM in-memmory implementation
    2. STM-IO, transactional file access
    3. STM-collections, rich transactional collections with ability to integrates with STM-IO.
    4. STM-Distributed, number of different commit protocol and clock implementations to build distributed transactions

    And some number of additional ones:
    5. STM-…, some other stuff, like XAA integrations and so on

    On top of that i could build whatever i want with minimum amount of work. I dont need to hack system for turning off default policies and so on. I think if someone wants to get “database” it will appears soner or later like an open source project build around the Multiverse πŸ™‚

    > About the ec3 stuff:
    Clock is a problem, totally agree πŸ™‚
    But in “Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery” by Morgan Kaufmann there are some algorithms for a distributed clocks (i guess i saw it there)

    > – more transactional datastructures
    > – improves instrumentation (arrays, more
    > performance, …)
    > – compiletime instrumentation
    > – commuting operations.
    Completelu agree, this is major things, distributed stuff a little bit different, but this is a core functions )

    I will send you my code, hope i will have a chance on weekend to port it in the multiverse codebase

  4. Philip Andronov says:

    BTW: About file stransactional acces, i failed to find much information about it (maybe someone have some papers about topic?). But recently i’ve found common-transactions from apache, they have implementation of transactional file access

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: