March 6th, 2009


Transactional Memory: Not a silver bullet for easy parallelization.

For Azul Systems' certainly, the name of the game is throughput: we appear to be generously over-provisioned with bandwidth. We can sustain 30G/sec allocation on 600G heaps with max pause times on the order of 10's of milliseconds. Each of our 864 cpus can sustain 2 cache-missing memory ops (plus a bunch of prefetches); a busy box will see 2300+ outstanding memory references at any time. We have a lite microkernel style OS; we can easily handle 100K runnable threads (not just blocked ones). Our JVM & GC scales easily to the whole box. In short: the bottleneck is NOT the platform. We need our users to be able to write scalable concurrent code.


In short, users' don't write "TM-friendly" code. Neither do library writers. Many times a small rewrite to remove the conflict makes the HTM useful. But this blows the "dusty deck" code - people just want their old code to run faster. The hard part here is getting customers to accept that a code rewrite is needed. Once they are over that mental hump, once a code rewrite is "on the table" - then the customers go whole-hog. Why make the code xTM-friendly when they can make it lock-friendly as well, and have it run fine on all gear (not just HTM-enabled gear)? Also locks have well understood performance characteristics, unlike TM's which generally rely on a complex and not-well-understood runtime portion (and indeed all the STMs out there have wildly varying "sweet spots" such that code which performs well on one STM might be really unusably slow on another STM).

Really what the customers want to know is: "which locks do I need to 'crack' to get performance?". Once they have that answer they are ready and willing to write fine-grained locking code. And nearly always the fine-grained locking is a very simple step up in complexity over what they had before. It's not the case that they need to write some uber-hard-to-maintain code to get performance. Instead it's the case that they have no clue which locks need to be "cracked" to get a speedup, and once that's pointed out the fixes are generally straightforward. (e.g., replacing sync/HashMap with ConcurrentHashMap, striping a lock, reducing hold times (generally via caching), switching to AtomicXXX::increment, etc)

For those of you who don't know, Azul Systems is the company that made custom silicon to execute Java in hardware, and currently sells 300-800 core massively parallel Java machines.

  • Current Music
    Type O Negative - Wolf Moon (including zoanthropic paranoia)
  • Tags

Who killed the turbine car?

After months of test and development work, a CR2A gas turbine engine was installed in a modified 1962 Dodge called the Dodge Turbo Dart. Styling modifications to the car were adapted to reflect its radically different power plant. The bladed wheel motif of the grille and wheel covers reflected the appearance of the vital components of the gas turbine.

The car left New York City on December 27, 1961, to begin a coast-to-coast engineering evaluation. After traveling 3, 100 miles through snowstorms, freezing rain, subzero temperatures and 25 to 40 mile per hour head winds, it arrived in Los Angeles on December 31.

The turbine not only lived up to all expectations but exceeded them! An inspection showed every part of the engine in excellent condition. Fuel economy was consistently better than a conventional car which traveled with the turbine car and was exposed to the same conditions. The key to the excellent performance and economy of the third generation gas turbine (called the CR2A) was its new variable turbine nozzle mechanism.

The automatic second stage turbine nozzles provided optimum results throughout the entire operating range of the engine. Thus, economy, performance, or engine braking could be maximized as required by the driver. For example, one area of performance is what is termed acceleration lag - the time it takes the compressor section to reach operating speed after the accelerator pedal is depressed. The first turbine engine had an acceleration lag of seven seconds from idle to full-rate output; the second engine required three seconds to achieve maximum vehicle acceleration, while this new engine required less than one and one-half seconds to accomplish the same performance.