Java Extreme Performance: Part I – Baseline

I’m currently working on writing a new STM engine for the Multiverse project and this time the focus is going to be on scalability and performance. The goal is that for uncontended data and 1 ref per transaction, the update performance should be a least 10 million per second and readonly transactions should be 50/75 million per second (all per core and it should scale linearly). So I’m going to write some posts about what I discovered while improving performance and this is the first one.

The first big thing is that profilers become useless (even if you use stack sampling) because the call stack often changes so frequently, that no information can be derived from it (even if you use a sampling frequency of 1 ms which apparently is the lowest you can get).

What works best for me is to throw away everything you have and start from scratch with an (incomplete) baseline. This helps to create some kind of ‘wow.. I wish I could get this performance’ and by adding more logic, you can see how expensive something is. In a lot of cases you will be flabbergasted how expensive certain constructs are; e.g. a volatile read or a polymorphic method call, or that the JIT is not able to optimize something you would expect it to optimize.

This is a very time consuming process (especially since it also depends on the platform or the jdk you are using). But it will help to gain a deeper insight and help you to write better performing code.

This entry was posted on Thursday, June 24th, 2010 at 9:10 am and is filed under Concurrency, Java, Performance, STM. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to Java Extreme Performance: Part I – Baseline

Seun Osewa says:

June 24, 2010 at 4:28 pm

You should investigate the game industry. Such extreme performance-mindedness will be really useful there!

Also, your STMmight be useful for scaling MMOs. Every planet in Eve Online runs on a single CPU core because the concurrent shared state problem is hard without transactional memory.

Reply
Java Extreme Performance: Part 2 – Object pooling « Blog of Peter Veentjer says:

June 28, 2010 at 9:20 pm

[…] 2 – Object pooling This is the second in the series about extreme Java performance; see Part 1 for more […]

Reply

	Peter Lawrey on val vs final
	fcr on Breaking Oracle SERIALIZABLE
	gilinachum on Are you dealing with the Rejec…
	pveentjer on Are you dealing with the Rejec…
	Gili Nachum on Are you dealing with the Rejec…

Blog of Peter Veentjer