In May 2009 I made a blog entry entitled You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part II. There had not yet been a Part I but as I pointed out in that post I would loop back and make Part I. Here it is. Better late than never.
I originally planned to use Part I to stroll down memory lane (back to 1995) with a story about the then VP of Oracle RDBMS Development’s initial impression about the Sequent DYNIX/ptx NUMA API during a session where we presented it and how it would be beneficial to code to NUMA APIs sooner rather than later. We were mixing vision with the specific need of our port to be honest.
We were the first to have a production NUMA API to which Oracle could port and we were quite a bit sooner to the whole NUMA trend than anyone else. Our’s was the first production NUMA system.
Now, this VP is no longer at Oracle but the (redacted) response was, “Why would we want to use any of this ^#$%.” We (me and the three others presenting the API) were caught off guard. However, we all knew that the question was a really good question. There were still good companies making really tight, high-end SMPs with uniform memory. Just because we (Sequent) had to move into NUMA architecture didn’t mean we were blind to the reality around us. However, one thing we knew for sure—all system in the future would have NUMA attributes of varying levels. All our competition was either in varying stages of denial or doing what I like to refer to as “Poo-pooh it while you do it.” All the major players came out with NUMA systems. Some sooner, some later and the others died trying.
That takes us to Commodity NUMA and the new purpose of this “Part I” post.
Before I say a word about this Part I I’d like to point out that the concepts in Part II are of a “must-know” variety unless you relinquish your computing power to some sort of hosted facility where you don’t have the luxury of caring about the architecture upon which you run Oracle Database.
Part II was about the different types of NUMA (historical and present) and such knowledge will help you if you find yourself in a troubling performance situation that relates to NUMA. NUMA is commodity, as I point out, and we have to come to grips with that.
What Is He Blogging About?
The current state of commodity NUMA is very peculiar. These Commodity NUMA Implementations (CNI) systems are so tightly coupled that most folks don’t even realize they are running on a NUMA system. In fact, let me go out on a ledge. I assert that nobody is configuring Oracle Database 11g Release 2 with NUMA optimizations in spite of the fact that they are on a NUMA box (e.g., Nehalem EP, AMD Operton). The reason I believe this is because the init.ora parameter to invoke Oracle NUMA awareness changed names from 11gR1 to 11gR2 as per My Oracle Support note 864633.1. The parameter changed from _enable NUMA_optimization to enable_NUMA_support. I know nobody is setting this because if they had I can almost guarantee they would have googled for problems. Allow me to explain.
If Nobody is Googling It, Nobody is Doing It
Anyone who tests _enable_NUMA_support as per My Oracle Support note 864633.1 will likely experience the sorts of problems that I detail later in this post. But first, let’s see what they would get from google when they search for _enable_NUMA_support:
Yes, just as I thought…Google found nothing. But what is my point? My point is two-fold. First, I happen to know that Nehalem EP with QPI and Opteron with AMD HyperTransport are such good technologies that you really don’t have to care that much about NUMA software optimizations. At least to this point of the game. Reading M.O.S note 1053332.1 (regards disabling Linux NUMA support for Oracle Database Machine hosts) sort of drives that point home. However, saying you don’t need to care about NUMA doesn’t mean you shouldn’t experiment. How can anyone say that setting _enable_NUMA_support is a total placebo in all cases? One can’t prove a negative.
If you dare, trust me when I say that an understanding of NUMA will be as essential in the next 10 years as understanding SMP (parallelism and concurrency) was in the last 20 years. OK, off my soapbox.
Some Lessons in Enabling Oracle NUMA Optimizations with Oracle Database 11g Release 2
This section of the blog aims to point out that even when you think you might have tested Oracle NUMA optimizations there is a chance you didn’t. You have to know the way to ensure you have NUMA optimizations in play. Why? Well, if the configuration is not right for enabling NUMA features, Oracle Database will simply ignore you. Consider the following session where I demonstrate the following:
- Evidence that I am on a NUMA system (numactl(8))
- I started up an instance with a pfile (p4.ora) that has _enable_NUMA_support set to TRUE
- The instance started but _enable_NUMA_support was forced back to FALSE
Note, in spite of event #3, the alert log will not report anything to you about what went wrong.
SQL> SQL> !numactl --hardware available: 2 nodes (0-1) node 0 size: 36317 MB node 0 free: 31761 MB node 1 size: 36360 MB node 1 free: 35425 MB node distances: node 0 1 0: 10 21 1: 21 10 SQL> startup pfile=./p4.ora ORACLE instance started. Total System Global Area 5746786304 bytes Fixed Size 2213216 bytes Variable Size 1207962272 bytes Database Buffers 4294967296 bytes Redo Buffers 241643520 bytes Database mounted. Database opened. SQL> show parameter _enable_NUMA_support NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _enable_NUMA_support boolean FALSE SQL> SQL> !grep _enable_NUMA_support ./p4.ora _enable_NUMA_support=TRUE
OK, so the instance is up and the parameter was reverted, what does the IPC shared memory segment look like?
SQL> !ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 644 72 2 0x00000000 32769 root 644 16384 2 0x00000000 65538 root 644 280 2 0xed304ac0 229380 oracle 660 4096 0 0x7393f7f4 1179653 oracle 660 5773459456 35 0x00000000 393223 oracle 644 790528 5 dest 0x00000000 425992 oracle 644 790528 5 dest 0x00000000 458761 oracle 644 790528 5 dest
Right, so I have no NUMA placement of the buffer pool. On Linux, Oracle must create multiple segments and allocate them on specific NUMA nodes (memory hierarchies). It was a little simpler for the first NUMA-aware port of Oracle (Sequent) since the APIs allowed for the creation of a single shared memory segment with regions of the segment placed onto different memories. Ho Hum.
What Went Wrong
Oracle could not find the libnuma.so it wanted to link with dlopen():
$ grep libnuma /tmp/strace.out | grep ENOENT | head 14626 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory) 14627 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)
So I create the necessary symbolic link and subsequently boot the instance and inspect the shared memory segments. Here I see that I have a ~1GB segment for the variable SGA components and my buffer pool has been segmented into two roughly 2.3 GB segments.
# ls -l /usr/*64*/*numa* lrwxrwxrwx 1 root root 23 Mar 17 09:25 /usr/lib64/libnuma.so -> /usr/lib64/libnuma.so.1 -rwxr-xr-x 1 root root 21752 Jul 7 2009 /usr/lib64/libnuma.so.1 SQL> show parameter db_cache_size NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_cache_size big integer 4G SQL> show parameter NUMA_support NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ _enable_NUMA_support boolean TRUE SQL> !ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 644 72 2 0x00000000 32769 root 644 16384 2 0x00000000 65538 root 644 280 2 0xed304ac0 229380 oracle 660 4096 0 0x00000000 2719749 oracle 660 1006632960 35 0x00000000 2752518 oracle 660 2483027968 35 0x00000000 393223 oracle 644 790528 6 dest 0x00000000 425992 oracle 644 790528 6 dest 0x00000000 458761 oracle 644 790528 6 dest 0x00000000 2785290 oracle 660 2281701376 35 0x7393f7f4 2818059 oracle 660 2097152 35
So there I have an SGA successfully created with _enable_NUMA_support set to TRUE. But, what strings appear in the alert log? Well, I’ll blog that soon because it leads me to other content.