You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part I.

In May 2009 I made a blog entry entitled You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part II. There had not yet been a Part I but as I pointed out in that post I would loop back and make Part I. Here it is. Better late than never.

Background
I originally planned to use Part I to stroll down memory lane (back to 1995) with a story about the then VP of Oracle RDBMS Development’s initial impression about the Sequent DYNIX/ptx NUMA API during a session where we presented it and how it would be beneficial to code to NUMA APIs sooner rather than later. We were mixing vision with the specific need of our port to be honest.

We were the first to have a production NUMA API to which Oracle could port and we were quite a bit sooner to the whole NUMA trend than anyone else. Our’s was the first production NUMA system.

Now, this VP is no longer at Oracle but the  (redacted) response was, “Why would we want to use any of this ^#$%.”  We (me and the three others presenting the API) were caught off guard. However, we all knew that the question was a really good question. There were still good companies making really tight, high-end SMPs with uniform memory.  Just because we (Sequent) had to move into NUMA architecture didn’t mean we were blind to the reality around us. However, one thing we knew for sure—all system in the future would have NUMA attributes of varying levels. All our competition was either in varying stages of denial or doing what I like to refer to as “Poo-pooh it while you do it.” All the major players came out with NUMA systems.  Some sooner, some later and the others died trying.

That takes us to Commodity NUMA and the new purpose of this “Part I” post.

Before I say a word about this Part I I’d like to point out that the concepts in Part II are of a “must-know” variety unless you relinquish your computing power to some sort of hosted facility where you don’t have the luxury of caring about the architecture upon which you run Oracle Database.

Part II was about the different types of NUMA (historical and present) and such knowledge will help you if you find yourself in a troubling performance situation that relates to NUMA. NUMA is commodity, as I point out, and we have to come to grips with that.

What Is He Blogging About?
The current state of commodity NUMA is very peculiar. These Commodity NUMA Implementations (CNI) systems are so tightly coupled that most folks don’t even realize they are running on a NUMA system. In fact, let me go out on a ledge. I assert that nobody is configuring Oracle Database 11g Release 2 with NUMA optimizations in spite of the fact that they are on a NUMA box (e.g., Nehalem EP, AMD Operton). The reason I believe this is because the init.ora parameter to invoke Oracle NUMA awareness changed names from 11gR1 to 11gR2 as per My Oracle Support note 864633.1. The parameter changed from _enable NUMA_optimization to enable_NUMA_support. I know nobody is setting this because if they had I can almost guarantee they would have googled for problems. Allow me to explain.

If Nobody is Googling It, Nobody is Doing It
Anyone who tests _enable_NUMA_support as per My Oracle Support note 864633.1 will likely experience the sorts of problems that I detail later in this post. But first, let’s see what they would get from google when they search for _enable_NUMA_support:

Yes, just as I thought…Google found nothing. But what is my point? My point is two-fold. First, I happen to know that Nehalem EP  with QPI and Opteron with AMD HyperTransport are such good technologies that you really don’t have to care that much about NUMA software optimizations. At least to this point of the game. Reading M.O.S note 1053332.1 (regards disabling Linux NUMA support for Oracle Database Machine hosts) sort of drives that point home. However, saying you don’t need to care about NUMA doesn’t mean you shouldn’t experiment. How can anyone say that setting _enable_NUMA_support is a total placebo in all cases? One can’t prove a negative.

If you dare, trust me when I say that an understanding of NUMA will be as essential in the next 10 years as understanding SMP (parallelism and concurrency) was in the last 20 years. OK, off my soapbox.

Some Lessons in Enabling Oracle NUMA Optimizations with Oracle Database 11g Release 2
This section of the blog aims to point out that even when you think you might have tested Oracle NUMA optimizations there is a chance you didn’t. You have to know the way to ensure you have NUMA optimizations in play. Why? Well, if the configuration is not right for enabling NUMA features, Oracle Database will simply ignore you. Consider the following session where I demonstrate the following:

  1. Evidence that I am on a NUMA system (numactl(8))
  2. I started up an instance with a pfile (p4.ora) that has _enable_NUMA_support set to TRUE
  3. The instance started but _enable_NUMA_support was forced back to FALSE

Note, in spite of event #3, the alert log will not report anything to you about what went wrong.

SQL>
SQL> !numactl --hardware
available: 2 nodes (0-1)
node 0 size: 36317 MB
node 0 free: 31761 MB
node 1 size: 36360 MB
node 1 free: 35425 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

SQL> startup pfile=./p4.ora
ORACLE instance started.

Total System Global Area 5746786304 bytes
Fixed Size                  2213216 bytes
Variable Size            1207962272 bytes
Database Buffers         4294967296 bytes
Redo Buffers              241643520 bytes
Database mounted.
Database opened.
SQL> show parameter _enable_NUMA_support

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_enable_NUMA_support                 boolean     FALSE

SQL>
SQL> !grep _enable_NUMA_support ./p4.ora
_enable_NUMA_support=TRUE

OK, so the instance is up and the parameter was reverted, what does the IPC shared memory segment look like?

SQL> !ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          root      644        72         2
0x00000000 32769      root      644        16384      2
0x00000000 65538      root      644        280        2
0xed304ac0 229380     oracle    660        4096       0
0x7393f7f4 1179653    oracle    660        5773459456 35
0x00000000 393223     oracle    644        790528     5          dest
0x00000000 425992     oracle    644        790528     5          dest
0x00000000 458761     oracle    644        790528     5          dest

Right, so I have no NUMA placement of the buffer pool. On Linux, Oracle must create multiple segments and allocate them on specific NUMA nodes (memory hierarchies). It was a little simpler for the first NUMA-aware port of Oracle (Sequent) since the APIs allowed for the creation of a single shared memory segment with regions of the segment placed onto different memories. Ho Hum.

What Went Wrong
Oracle could not find the libnuma.so it wanted to link with dlopen():

$ grep libnuma /tmp/strace.out | grep ENOENT | head
14626 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)
14627 open("/usr/lib64/libnuma.so", O_RDONLY) = -1 ENOENT (No such file or directory)

So I create the necessary symbolic link and subsequently boot the instance and inspect the shared memory segments. Here I see that I have a ~1GB segment for the variable SGA components and my buffer pool has been segmented into two roughly 2.3 GB segments.

# ls -l /usr/*64*/*numa*
lrwxrwxrwx 1 root root    23 Mar 17 09:25 /usr/lib64/libnuma.so -> /usr/lib64/libnuma.so.1
-rwxr-xr-x 1 root root 21752 Jul  7  2009 /usr/lib64/libnuma.so.1

SQL> show parameter db_cache_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_cache_size                        big integer 4G
SQL> show parameter NUMA_support

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_enable_NUMA_support                 boolean     TRUE
SQL> !ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          root      644        72         2
0x00000000 32769      root      644        16384      2
0x00000000 65538      root      644        280        2
0xed304ac0 229380     oracle    660        4096       0
0x00000000 2719749    oracle    660        1006632960 35
0x00000000 2752518    oracle    660        2483027968 35
0x00000000 393223     oracle    644        790528     6          dest
0x00000000 425992     oracle    644        790528     6          dest
0x00000000 458761     oracle    644        790528     6          dest
0x00000000 2785290    oracle    660        2281701376 35
0x7393f7f4 2818059    oracle    660        2097152    35

So there I have an SGA successfully created with _enable_NUMA_support set to TRUE. But, what strings appear in the alert log? Well, I’ll blog that soon because it leads me to other content.

12 Responses to “You Buy a NUMA System, Oracle Says Disable NUMA! What Gives? Part I.”


  1. 1 Christo Kutrovsky March 19, 2010 at 5:08 pm

    Kevin,

    Could you talk a little bit on the types of optimizations this brings to the table?

    That we could better understand in what workloads such optimizations will actually provide benefit.

    • 2 kevinclosson March 19, 2010 at 9:30 pm

      Probably not since any more it seems some technology circles lean more towards FaithBasedTechnology(tm) and it’s all “just supposed to work.” Education about how to maximize your investment might seen as “casting vendor X product in a bad light.” I’ll have to think more like a snake than a fox in order to slither my way through these sorts of twisty passages.

  2. 3 Paul Janda April 27, 2010 at 9:07 pm

    Nice Technocracy. I’ve got some more for you, because I have a NUMA question about Sun hardware, old and new.
    First, Regarding, Part III: Our (sigh, yes, again, “our old”) old Sun v1280 servers are NUMA. I agree with you: Sequent was ahead of it’s time and NUMA is extremely common and good.
    We are replacing our v1280’s with M4000’s.
    ——– You wrote:
    ipcs -m

    Right, so I have no NUMA placement of the buffer pool. On Linux, Oracle must create multiple segments and allocate them on specific NUMA nodes (memory hierarchies). It was a little simpler for the first NUMA-aware port of Oracle (Sequent) since the APIs allowed for the creation of a single shared memory segment with regions of the segment placed onto different memories. Ho Hum.
    ——— Now, my question:
    The v1280’s/Solaris 9 split the buffer cache into two, one for each quad, and then the 9.2 (10g?) instance would use only one of the two pieces. Eww. The solution: _enable_numa_optimization=false.
    In 11.1.0.7, there is a note and a one-off:
    Bug:8199533: NEED NEW PARAMETER TO DISABLE NUMA SUPPORT AND RELATED LOG INFO.
    Note:456232.1 Expected cache memory not being used on NUMA platform
    Do I install this patch on the M4000? I ask you because you understand this stuff and because of my prejudices/experience that Oracle Support people won’t understand the question and most Sun people have a habit of synonomously using “shared pool” and “SGA”
    -paul

  3. 4 Paul Janda April 28, 2010 at 12:13 pm

    sorry for that last comment.
    I wrote:
    >> Do I install this patch on the M4000?
    If I were installing 11.2, I’d be getting that patch automatically.
    I’ve done a log more perusing of M.O.S. and I’ve also seen stuff like “problems with tar/dd” and “ora-600 when do online board replacements”.
    I guess, the real questions have already been addressed here:
    How good/bad is Sun M4000 at NUMA?
    and
    How do I actually measure the benefit of turning NUMA support on?
    ——
    Also, to continue my shameless naivette, what do the Oracle developers, and the newer VP’s, say about making NUMA more known (and used) to the mainstream Oracle DBA?

  4. 5 glennfawcett April 28, 2010 at 9:31 pm

    The M4000 is UMA. So, you don’t need any NUMA optimizations. You can check the number of latency groups “lgroups” by doing the following:

    root@ebiz1> kstat -m lgrp
    module: lgrp instance: 1
    name: lgrp1 class: misc
    alloc fail 813
    cpus 16
    crtime 137.5915144
    default policy 0
    load average 4116
    lwp migrations 0
    next-seg policy 0
    next-touch policy 435088862
    pages avail 4078256
    pages failed to mark 0
    pages failed to migrate from 0
    pages failed to migrate to 0
    pages free 1501302
    pages installed 4194304
    pages marked for migration 0
    pages migrated from 0
    pages migrated to 0
    random policy 7738755
    round robin policy 0
    snaptime 1643411.8335022
    span process policy 0
    span psrset policy 0

  5. 6 mbobak July 9, 2010 at 2:25 pm

    Hi Kevin,

    I found part II earlier, just found this. So, if I’m running 11gR2, NUMA is off by default. So, should I disable it in the O/S? I’m running a pair of X5570 CPUs in an HP DL-360 (actually a 4 node RAC of 360s), and NUMA is enabled at the O/S level, as can be observed w/ ‘numactl –hardware’.

    So, I guess my question is, is there any issue w/ having NUMA enabled in the O/S, but disabled by Oracle? Is there anything to be gained by either disabling NUMA in the O/S or enabling it in Oracle?

    Thanks!

    -Mark

    • 7 kevinclosson July 9, 2010 at 5:37 pm

      Hi Mark,

      With a 2s Nehalem EP server you will be extremely hard pressed to find benefit from software NUMA awareness. Full stop. It is not sufficiently lumpy. I recommend software NUMA disabled in this OS (ala grub) just as I did here in development for Exadata Database Machine Version 2 (and that recommendation stuck for many, many reasons).

      Now, 4s or 8s servers that’s a totally different story so don’t have this particular recommendation burned in as a generic NUMA mentality. It only applies to the puny atypical case of 2s Nehalem EP and most like Westmere EP (Xeon 5600).

      • 8 mbobak July 9, 2010 at 11:54 pm

        Thanks Kevin.

        So, I guess my last question is this: NUMA is disabled in Oracle, in 11.2. It’s enabled at the O/S. Is there sufficient cause to *not* run this way? Should I bother with turning it off in the O/S? I ask cause it will be a hassle to do so, schedule downtime, etc. So, does it really matter at all if I leave it enabled in the O/S if Oracle is not taking advantage of it?

        Seems like there is some effect to having it enabled in the O/S, as I’m using HugePages, and by looking at /sys/devices/system/node/*/meminfo I can see that (roughly) half of my HugePages allocation is managed by each of two nodes (0 and 1).

        Thanks again,

        -Mark

        • 9 kevinclosson July 10, 2010 at 6:20 am

          I recommend you st numa=off in grub for the same reasons I insisted in the database hosts of the Exadata Database Machine. If it doesn’t help, the only thing is can do is hurt.

          Everyone in the world essentially ignored NUMA after Sequent Computer Systems died. Somebody, someday will have to pay for that dereliction. However, with 2s Xeon 5500 there is not need to bear that cross as of yet.

  6. 10 mbobak July 10, 2010 at 8:11 am

    Ok, I’ll broach the subject w/ my sysadmins on Monday.

    Thanks!

    -Mark


  1. 1 Blogroll Report 12/03/2010 – 19/03/2010 « Coskan’s Approach to Oracle Trackback on April 29, 2010 at 6:15 pm
  2. 2 Non-Uniform Memory Access (NUMA) architecture with Oracle database by examples | IT World Trackback on October 9, 2012 at 7:47 am

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,981 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 1,981 other followers

%d bloggers like this: