Log in

No account? Create an account
Your daily dose of programming language nerdity. - Adventures in Engineering — LiveJournal
The wanderings of a modern ronin.

Ben Cantrick
  Date: 2008-04-01 11:40
  Subject:   Your daily dose of programming language nerdity.
  Tags:  reddit

Last week, I used the Lwt cooperative lightweight thread library to implement a benchmark that measures context switch performance, determined that it was GC bound and timed it against comparable programs (i.e., the fastest implementations in the computer language benchmark games, which are all based on lightweight threads) and a C version that uses POSIX threads, obtaining these results:

Haskell GHC 6.8.2 2680KB 1.22s
GCC C (POSIX threads) 4520KB 28.7s

Here are the figures I get for the C version I made with Protothreads:

GCC C (Protothreads, optimum scheduling) 220KB 0.076s
GCC C (Protothreads, pessimum scheduling) 220KB 18.6s

It is nearly 400 times faster than the C version with POSIX threads, and represents a one order of magnitude improvement over the other lightweight thread implementations. It also needs less memory. The performance is almost unbelievable.


Writing code more quickly and easily is one advantage of newer languages. But when it comes to sheer performance, I don't think we'll ever beat hand-optimized C. (Except, perhaps, with hand-optimized assembler. Which is really not much different from hand-optimized C.)

One of the things I learned in The Server Side Java Symposium 2008 was a command-line option to print out the assembly code that JIT is producing. Since I've always been interested in seeing the final assembly code that gets produced from your Java code, I decided to give it a test drive.

First, let's try something trivial:

public class Main {
public static void main(String[] args) {
for(int i=0; i<100; i++){ foo(); }

private static void foo() {
for(int i=0; i<100; i++){ bar(); }

private static void bar() { }

I run this like "java -XX:+PrintOptoAssembly -server -cp . Main". The -XX:+PrintOptoAssembly is the magic option, and with this option I get the following, which shows the code of the "foo" method:

000 B1: # N1 <- BLOCK HEAD IS JUNK Freq: 100
000 pushq rbp
subq rsp, #16 # Create frame
nop # nop for patch_verified_entry
006 addq rsp, 16 # Destroy frame
popq rbp
testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC
011 ret

You see that the entire bar() function call and the loop was optimized away. So it must have inlined the bar() method, then unrolled the loop.


On the other hand, that's some nice automatic optimization there! It killed approx 1100 unnecessary function calls! Though I can't see why it didn't take the last step and replace main() with just "ret". Maybe it wanted to allow a breakpoint in main()?

And now for contrast, the DOING IT RONG side:

The ISO C++ committee met in Bellevue, WA, USA on February 24 to Mar 1. For me, easily the biggest news of the meeting was that we voted lambda functions and closures into C++0x.


My complaint about Lisp is that everyone seems to think it's a great idea to add all of Lisp's features (lambda, closures, etc) to every language under the sun. News flash: There's a reason that we design different languages for different purposes. Quit getting your Lisp in my portable assembler! And get off my lawn!! - http://reddit.com/r/programming/info/6dw9d/comments/c03kzmi
Post A Comment | | Link

May 2015