In May 2009 I made a blog entry entitled You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part II. There had not yet been a Part I but as I pointed out in that post I would loop back and make Part I. Here it is. Better late than never.
Background
I originally planned to use Part I to stroll down memory lane (back to 1995) with a story about the then VP of Oracle RDBMS Development’s initial impression about the Sequent DYNIX/ptx NUMA API during a session where we presented it and how it would be beneficial to code to NUMA APIs sooner rather than later. We were mixing vision with the specific need of our port to be honest.
We were the first to have a production NUMA API to which Oracle could port and we were quite a bit sooner to the whole NUMA trend than anyone else. Our’s was the first production NUMA system.
Now, this VP is no longer at Oracle but the (redacted) response was, “Why would we want to use any of this ^#$%.” We (me and the three others presenting the API) were caught off guard. However, we all knew that the question was a really good question. There were still good companies making really tight, high-end SMPs with uniform memory. Just because we (Sequent) had to move into NUMA architecture didn’t mean we were blind to the reality around us. However, one thing we knew for sure—all systems in the future would have NUMA attributes of varying levels. All our competition was either in varying stages of denial or doing what I like to refer to as “Poo-pooh it while you do it.” All the major players eventually came out with NUMA systems. Some sooner, some later and the others died trying.
That takes us to Commodity NUMA and the new purpose of this “Part I” post.
Before I say a word about this Part I I’d like to point out that the concepts in Part II are of a “must-know” variety unless you relinquish your computing power to some sort of hosted facility where you don’t have the luxury of caring about the architecture upon which you run Oracle Database.
Part II was about the different types of NUMA (historical and present) and such knowledge will help you if you find yourself in a troubling performance situation that relates to NUMA. NUMA is commodity, as I point out, and we have to come to grips with that.
What Is He Blogging About?
The current state of commodity NUMA is very peculiar. These Commodity NUMA Implementations (CNI) systems are so tightly coupled that most folks don’t even realize they are running on a NUMA system. In fact, let me go out on a ledge. I assert that nobody is configuring Oracle Database 11g Release 2 with NUMA optimizations in spite of the fact that they are on a NUMA box (e.g., Nehalem EP, AMD Opterton). The reason I believe this is because the init.ora parameter to invoke Oracle NUMA awareness changed names from 11gR1 to 11gR2 as per My Oracle Support note 864633.1. The parameter changed from _enable_NUMA_optimization to enable_NUMA_support. I know nobody is setting this because if they had I can almost guarantee they would have googled for problems. Allow me to explain.
If Nobody is Googling It, Nobody is Doing It
Anyone who tests _enable_NUMA_support as per My Oracle Support note 864633.1 will likely experience the sorts of problems that I detail later in this post. But first, let’s see what they would get from google when they search for _enable_NUMA_support:
Yes, just as I thought…Google found nothing. But what is my point? My point is two-fold. First, I happen to know that Nehalem EP with QPI and Opteron with AMD HyperTransport are such good technologies that you really don’t have to care that much about NUMA software optimizations. At least to this point of the game. Reading M.O.S note 1053332.1 (regards disabling Linux NUMA support for Oracle Database Machine hosts) sort of drives that point home. However, saying you don’t need to care about NUMA doesn’t mean you shouldn’t experiment. How can anyone say that setting _enable_NUMA_support is a total placebo in all cases? One can’t prove a negative.
If you dare, trust me when I say that an understanding of NUMA will be as essential in the next 10 years as understanding SMP (parallelism and concurrency) was in the last 20 years. OK, off my soapbox.
Some Lessons in Enabling Oracle NUMA Optimizations with Oracle Database 11g Release 2
This section of the blog aims to point out that even when you think you might have tested Oracle NUMA optimizations there is a chance you didn’t. You have to know the way to ensure you have NUMA optimizations in play. Why? Well, if the configuration is not right for enabling NUMA features, Oracle Database will simply ignore you. Consider the following session where I demonstrate the following:
- Evidence that I am on a NUMA system (numactl(8))
- I started up an instance with a pfile (p4.ora) that has _enable_NUMA_support set to TRUE
- The instance started but _enable_NUMA_support was forced back to FALSE
Note, in spite of event #3, the alert log will not report anything to you about what went wrong.
SQL> SQL> !numactl --hardware available: 2 nodes (0-1) node 0 size: 36317 MB node 0 free: 31761 MB node 1 size: 36360 MB node 1 free: 35425 MB node distances: node 0 1 0: 10 21 1: 21 10 SQL> startup pfile=./p4.ora ORACLE instance started. Total System Global Area 5746786304 bytes Fixed Size 2213216 bytes Variable Size 1207962272 bytes Database Buffers 4294967296 bytes Redo Buffers 241643520 bytes Database mounted. Database opened. SQL> show parameter _enable_NUMA_support NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _enable_NUMA_support boolean FALSE SQL> SQL> !grep _enable_NUMA_support ./p4.ora _enable_NUMA_support=TRUE
OK, so the instance is up and the parameter was reverted, what does the IPC shared memory segment look like?
SQL> !ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 644 72 2 0x00000000 32769 root 644 16384 2 0x00000000 65538 root 644 280 2 0xed304ac0 229380 oracle 660 4096 0 0x7393f7f4 1179653 oracle 660 5773459456 35 0x00000000 393223 oracle 644 790528 5 dest 0x00000000 425992 oracle 644 790528 5 dest 0x00000000 458761 oracle 644 790528 5 dest
Right, so I have no NUMA placement of the buffer pool. On Linux, Oracle must create multiple segments and allocate them on specific NUMA nodes (memory hierarchies). It was a little simpler for the first NUMA-aware port of Oracle (Sequent) since the APIs allowed for the creation of a single shared memory segment with regions of the segment placed onto different memories. Ho Hum.
What Went Wrong
Oracle could not find the libnuma.so it wanted to link with dlopen():
$ grep libnuma /tmp/strace.out | grep ENOENT | head 14626 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory) 14627 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)
So I create the necessary symbolic link and subsequently boot the instance and inspect the shared memory segments. Here I see that I have a ~1GB segment for the variable SGA components and my buffer pool has been segmented into two roughly 2.3 GB segments.
# ls -l /usr/*64*/*numa* lrwxrwxrwx 1 root root 23 Mar 17 09:25 /usr/lib64/libnuma.so -> /usr/lib64/libnuma.so.1 -rwxr-xr-x 1 root root 21752 Jul 7 2009 /usr/lib64/libnuma.so.1 SQL> show parameter db_cache_size NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_cache_size big integer 4G SQL> show parameter NUMA_support NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _enable_NUMA_support boolean TRUE SQL> !ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 644 72 2 0x00000000 32769 root 644 16384 2 0x00000000 65538 root 644 280 2 0xed304ac0 229380 oracle 660 4096 0 0x00000000 2719749 oracle 660 1006632960 35 0x00000000 2752518 oracle 660 2483027968 35 0x00000000 393223 oracle 644 790528 6 dest 0x00000000 425992 oracle 644 790528 6 dest 0x00000000 458761 oracle 644 790528 6 dest 0x00000000 2785290 oracle 660 2281701376 35 0x7393f7f4 2818059 oracle 660 2097152 35
So there I have an SGA successfully created with _enable_NUMA_support set to TRUE. But, what strings appear in the alert log? Well, I’ll blog that soon because it leads me to other content.
Kevin,
Could you talk a little bit on the types of optimizations this brings to the table?
That we could better understand in what workloads such optimizations will actually provide benefit.
Probably not since any more it seems some technology circles lean more towards FaithBasedTechnology(tm) and it’s all “just supposed to work.” Education about how to maximize your investment might seen as “casting vendor X product in a bad light.” I’ll have to think more like a snake than a fox in order to slither my way through these sorts of twisty passages.
Nice Technocracy. I’ve got some more for you, because I have a NUMA question about Sun hardware, old and new.
First, Regarding, Part III: Our (sigh, yes, again, “our old”) old Sun v1280 servers are NUMA. I agree with you: Sequent was ahead of it’s time and NUMA is extremely common and good.
We are replacing our v1280’s with M4000’s.
——– You wrote:
ipcs -m
…
Right, so I have no NUMA placement of the buffer pool. On Linux, Oracle must create multiple segments and allocate them on specific NUMA nodes (memory hierarchies). It was a little simpler for the first NUMA-aware port of Oracle (Sequent) since the APIs allowed for the creation of a single shared memory segment with regions of the segment placed onto different memories. Ho Hum.
——— Now, my question:
The v1280’s/Solaris 9 split the buffer cache into two, one for each quad, and then the 9.2 (10g?) instance would use only one of the two pieces. Eww. The solution: _enable_numa_optimization=false.
In 11.1.0.7, there is a note and a one-off:
Bug:8199533: NEED NEW PARAMETER TO DISABLE NUMA SUPPORT AND RELATED LOG INFO.
Note:456232.1 Expected cache memory not being used on NUMA platform
Do I install this patch on the M4000? I ask you because you understand this stuff and because of my prejudices/experience that Oracle Support people won’t understand the question and most Sun people have a habit of synonomously using “shared pool” and “SGA”
-paul
sorry for that last comment.
I wrote:
>> Do I install this patch on the M4000?
If I were installing 11.2, I’d be getting that patch automatically.
I’ve done a log more perusing of M.O.S. and I’ve also seen stuff like “problems with tar/dd” and “ora-600 when do online board replacements”.
I guess, the real questions have already been addressed here:
How good/bad is Sun M4000 at NUMA?
and
How do I actually measure the benefit of turning NUMA support on?
——
Also, to continue my shameless naivette, what do the Oracle developers, and the newer VP’s, say about making NUMA more known (and used) to the mainstream Oracle DBA?
The M4000 is UMA. So, you don’t need any NUMA optimizations. You can check the number of latency groups “lgroups” by doing the following:
root@ebiz1> kstat -m lgrp
module: lgrp instance: 1
name: lgrp1 class: misc
alloc fail 813
cpus 16
crtime 137.5915144
default policy 0
load average 4116
lwp migrations 0
next-seg policy 0
next-touch policy 435088862
pages avail 4078256
pages failed to mark 0
pages failed to migrate from 0
pages failed to migrate to 0
pages free 1501302
pages installed 4194304
pages marked for migration 0
pages migrated from 0
pages migrated to 0
random policy 7738755
round robin policy 0
snaptime 1643411.8335022
span process policy 0
span psrset policy 0
Hi Kevin,
I found part II earlier, just found this. So, if I’m running 11gR2, NUMA is off by default. So, should I disable it in the O/S? I’m running a pair of X5570 CPUs in an HP DL-360 (actually a 4 node RAC of 360s), and NUMA is enabled at the O/S level, as can be observed w/ ‘numactl –hardware’.
So, I guess my question is, is there any issue w/ having NUMA enabled in the O/S, but disabled by Oracle? Is there anything to be gained by either disabling NUMA in the O/S or enabling it in Oracle?
Thanks!
-Mark
Hi Mark,
With a 2s Nehalem EP server you will be extremely hard pressed to find benefit from software NUMA awareness. Full stop. It is not sufficiently lumpy. I recommend software NUMA disabled in this OS (ala grub) just as I did here in development for Exadata Database Machine Version 2 (and that recommendation stuck for many, many reasons).
Now, 4s or 8s servers that’s a totally different story so don’t have this particular recommendation burned in as a generic NUMA mentality. It only applies to the puny atypical case of 2s Nehalem EP and most like Westmere EP (Xeon 5600).
Thanks Kevin.
So, I guess my last question is this: NUMA is disabled in Oracle, in 11.2. It’s enabled at the O/S. Is there sufficient cause to *not* run this way? Should I bother with turning it off in the O/S? I ask cause it will be a hassle to do so, schedule downtime, etc. So, does it really matter at all if I leave it enabled in the O/S if Oracle is not taking advantage of it?
Seems like there is some effect to having it enabled in the O/S, as I’m using HugePages, and by looking at /sys/devices/system/node/*/meminfo I can see that (roughly) half of my HugePages allocation is managed by each of two nodes (0 and 1).
Thanks again,
-Mark
I recommend you st numa=off in grub for the same reasons I insisted in the database hosts of the Exadata Database Machine. If it doesn’t help, the only thing is can do is hurt.
Everyone in the world essentially ignored NUMA after Sequent Computer Systems died. Somebody, someday will have to pay for that dereliction. However, with 2s Xeon 5500 there is not need to bear that cross as of yet.
Ok, I’ll broach the subject w/ my sysadmins on Monday.
Thanks!
-Mark