Friday, August 27, 2010

x86 Open64 Compiler

I recently updated both my laptop and my home machine to Ubuntu 10.04. This had one significant negative side effect on my workstation in that it broke the Intel compiler I had installed by the system maker. I started doing a bit of searching and the problem I have is a known problem with the newer libraries so it isn't clear that paying Intel more money would actually fix the problem. However, during my searching I came across the x86 Open64 compiler. This is an optimizing compiler from AMD. It is fairly new. I have vague memories of seeing announcements of the release back in 2009, but I didn't look closely at it at the time. Not having a working compiler other than gcc on my system made me consider taking a second look.

While they say they have only really tested it under RedHat and Fedora, the binaries work just fine for me on Ubuntu. I have now also installed it on all the machines at Trinity. I need to do a direct speed test between it and the Intel compiler on my cluster, but I haven't done so yet. The set of optimizations that they list is quite impressive and from what I have seen it certainly isn't significantly slower than Intel on my home machine.

There is one very odd thing that I have noticed on my home machine though. My rings code uses OpenMP for multithreading and big simulations typically run on all eight cores of my home machine. Smaller simulations often sit at a load of 5-7. I started a large simulation using x86 Open64 and it keeps all 8 cores busy. However, smaller simulations that run through steps really quickly are running at loads of 3-4. The load level is also far less ragged than what I had seen with the Intel compiler. At this point I can't say if this is a good thing or not. At first I worried it wasn't properly parallelizing. Now I'm wondering if it just does a better job of keeping jobs to specific cores and keeping those cores loaded instead of having stuff jump around. That would certainly be a smart thing to do as it would improve cache performance, but only benchmarking and profiling will tell me for certain.