According to this Business Wire article, the Intel Xeon 5500 (a.k.a., Nehalem) is making a huge splash with an Oracle Database 11g TPC-C result of 631,766 TpmC. At 78,970 TpmC/core, that is an outrageous result! I remember when it was difficult to push a 64 CPU system to the level these CPUs get with only one processor core! I had to quickly scurry over to the TPC website to dig in to the disclosures, but, as of 8:12 PM GMT it had not been posted yet:
Jumping the Gun
Ah, but then by the time I got around again to check it had indeed been posted. The first thing I did was check the full disclosure report to see what sort of Oracle NUMA-specific tweaking was done in the init.ora. None. That, is very good news to me. The last thing I want to see is a bunch of confusing NUMA-specific tuning. Allow me to quote myself with a saying I’ve been rattling off for years:
The best NUMA system is the best SMP system.
By that I mean it shouldn’t take application software tuning to get your money’s worth out of the platform. Sure, we had to do it back in the mid to late 1990′s with the pioneer NUMA systems, but that was largely due to the incredible ratio between local memory latency and highly-contended remote memory (and due to the concept of remote I/O which does not apply here). Of course the operating system has to be NUMA aware. Period.
I know what the ratios are on Xeon 5500 series but I can’t recall whether or not the specific number I have in mind is one I obtained under non-disclosure so I’m not going to go blurting it out. However, it turns out that as long as memory is fairly placed (e.g., not a Cyclops ) and the ratio is comfortably below 2:1 (R:L) you’re going to get a real SMP “feel” from the box. Of course, the closer the ratio leans towards 1:1 the better.
NUMA is a hardware architecture that breaks bottlenecks. It shouldn’t have to break SMP programming principles in the process. The Intel Xeon 5570, it turns out, is the sort of NUMA system you should all be clamoring for. What kind of NUMA system is that? The answer is a NUMA system that is indistinguishable from a flat-memory SMP.
PS. I actually already knew what level of NUMA tuning was used in this TPC-C testing. I just couldn’t blog about it. I also know the precise R:L memory latency ratio for the box. The way I look at it though is since this modern NUMA system gets 78,970 TpmC/core, the R:L ratio is unnecessary minutiae-as is thoughts of NUMA software tuning. I never imagined NUMA would come far enough for me to write that.