Ben Cantrick (mackys) wrote,
Ben Cantrick

  • Mood:

GCC 4.2 + LLVM 2.2 = 35% faster than GCC 4.2 alone.

Where these performance differences come from? In Himeno bench, The inner-most loop in jacobi() consumes 99.7% of whole computation time. I disassembled executables of this part, and found LLVM 2.2 emits very efficient x86 assembly. It is composed of move, add/sub and mul instructions. These 3 instructions can be executed in parallel on Core2 CPU. (Core2 has independent load/store unit, additive fp ALU and multiply fp ALU)

Back in the 70's everyone argued that compilers would never produce faster code than hand-tweaked assembly. These days, you have to be awfully damn good at assembly to create faster code than an optimizing compiler can. So gee, I wonder what's going to happen in the future with virtual machines - which "everyone knows" will never be as fast as compiled code. (Especially when everyone has a multi-core CPU. Cuz we all know great human beings are at thinking in parallel, don't we?)

(Another advantage of VMs - they can optimize code for cache coherency. Which can get you between one and two orders of magnitude increase in performance, if done exactly right.)

NB: Someone suggested that the tests be run again, this time passing "-march=Core2" to gcc. It might create more optimized code and beat LLVM in that case.
Tags: reddit
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.