Oracle11g Automatic Memory Management – Part I. Linux Hugepages Support. | Kevin Closson's Blog: Platforms, Databases and Storage

Oracle11g Automatic Memory Management – Part I. Linux Hugepages Support.

Published August 23, 2007 oracle , Oracle 11g , oracle hugepages , Oracle performance 19 Comments

I spent the majority of my time in the Oracle Database 11g Beta program testing storage-related aspects of the new release. To be honest, I didn’t even take a short peek at the new Automatic Memory Management feature. As I pointed out the other day, Tanel Poder has started blogging about the feature.

If you read Tanel’s post you’ll see that he points out AMM-style shared memory does not use hugepages. This is because AMM memory segments are memory mapped files in /dev/shm. At this time, the major Linux distributions do not implement backing memory mapped files with hugepages as they do with System V-style IPC shared memory. The latter supports the SHM_HUGETLB flag passed to the shmget(P) call. It appears as though there was an effort to get hugepages support for memory mapped pages by adding MAP_HUGETLB flag support for the mmap(P) call as suggested in this kernel developer email thread from 2004. I haven’t been able to find just how far that proposed patch went however. Nonetheless, I’m sure Wim’s group is more than aware of that proposed mmap(P) support and if it is really important for Oracle Database 11g Automatic Memory Management, it seems likely there would be a 2.6 Kernel patch for it someday. But that begs the question: just how important are hugepages? Is it blasphemy to even ask the question?

Memory Mapped Files and Oracle Ports
The concept of large page tables is a bit of a porting nightmare. It will be interesting to see how the other ports deal with OS-level support for the dynamic nature of Automatic Memory Management. Will the other ports also use memory mapped files instead of IPC Shared Memory? If so, they too will have spotty large page table support for memory mapped files. For instance, Solaris 9 supported large page tables for mmap(2) pages, but only if it was an anonymous mmap (e.g., a map without a file) or a map of /dev/zero-neither of which would work for AMM. I understand that Solaris 10 supports large page tables for mmap(2) regions that are MAP_SHARED mmap(2)s of files-which is most likely how AMM will look on Solaris, but I’m only guessing. Other OSes, like Tru64-and I’m quite sure most others-don’t support large page tables for mmap(2)ed files. This will be interesting to watch.

Performance, Large Page Table, Etc
I remember back in the mid-90s when Sequent implemented shared large page tables for IPC Shared memory on our Unix variant-DYNIX/ptx. It was a very significant performance enhancement. For instance, 1024 shadow processes attached to a 1GB SGA required 1GB of physical memory-for the page tables alone! That was significant on systems that had very small L2 caches and only supported 4GB physical memory. Fast forwarding to today. I know people with Oracle 10g workloads that absolutely seize up their Linux (2.6. Kernel) system unless they use hugepages. Now I should point out that these sites I know of have a significant mix of structured and unstructured data. That is, they call out to LOBs in the filesystem (give me SecureFiles please). So the pathology they generally suffered without hugepages was memory thrashing between Oracle and the OS page cache (filesystem buffer cache). The salve for those wounds was hugepages since that essentially carves out and locks down the memory at boot time. Hugepages memory can never be nibbled up for page cache. To that end, benefiting from hugepages in this way is actually a by-product. The true point behind hugepages not the fact that it is reserved at boot time, but the fact that CPUs don’t have to thrash to maintain the physical to virtual translations (tlb). In general, hugepages are a lot more polite on processor caches and they reduce RAM overhead for page tables. Compared to the mid 1990s, however, RAM is about the least of our worries these days. Manageability is the most important and AMM aims to help on that front.

Confusion
Of all things Oracle and Linux, I think one of the topics that gets mangled the most is hugepages. The terms and nobs to twist run the gamut. There’s hugepages, hugetlb, hugetlbfs, hugetlbpool and so on. Then there are the differences from one Linux distribution and Linux kernel to the other. For instance, you can’t use hugepages on SuSE unless you turn off vm.disable_cap_mlock (need a few double negatives?). Then there is the question of boot-time versus /proc or sysctl(8) to reserve the pages. Finally, there is the fact that if you don’t have enough hugepages when you boot Oracle, Oracle will not complain-you just don’t get hugepages. I think Metalink 361323.1 does a decent job explaining hugepages with old and recent Linux in mind, but I never see it explained as succinctly as follows:

Use OEL 4 or RHEL 4 with Oracle Database 10g or 11g
Set oracle hard memlock N in /etc/security/limits.conf where N is a value large enough to cover your SGA needs
Set vm.nr_hugepages in /etc/sysctl.conf to a value large enough to cover your SGA.

Further Confusion
Audited TPC results don’t help. For instance, on page 125 of this Full disclosure report from a recent Oracle10g TPC-C, there are listings of sysctl.conf and lilo showing the setting of the hugetlbpool parameter. That would be just fine if this was a RHEL3 benchmark since vm.hugetlbpool doesn’t exist in RHEL4.

Performance
I admit I haven’t done a great deal of testing with AMM, but generally a quick I/O-intensive OLTP test on a system with 4 processor cores utilized at 100% speak volumes to me. So I did just such a test.

Using an order-entry workload accessing the schema detailed in this Oracle Whitepaper about Direct NFS, I tested two configurations:

Automatic Memory Management (AMM). Just like it says, I configured the simplest set of initialization parameters I could:

UNDO_TABLESPACE=rb1
UNDO_MANAGEMENT = AUTO
compatible = 10.1.0.0
control_files                  = ( /u01/app/oracle/product/11/db_1/rw/DATA/cntlbench_1 )
db_block_size                   = 4096
MEMORY_TARGET=1500M
db_files                        = 100
db_writer_processes = 1
db_name                         = bench
processes                       = 200
sessions                        = 400
cursor_space_for_time           = TRUE  # pin the sql in cache
filesystemio_options=setall

Manual Memory Management(MMM). I did my best to tailor the important SGA regions to match what AMM produced. In my mind, for an OLTP workload the most important SGA regions are the block buffers and the shared pool.

UNDO_TABLESPACE=rb1
UNDO_MANAGEMENT = AUTO
compatible = 10.1.0.0
control_files                  = ( /u01/app/oracle/product/11/db_1/rw/DATA/cntlbench_1 )
db_block_size                   = 4096
#MEMORY_TARGET=1500M
db_cache_size = 624M
shared_pool_size=224M
db_files                        = 100
db_writer_processes = 1
db_name                         = bench
processes                       = 200
sessions                        = 400
cursor_space_for_time           = TRUE  # pin the sql in cache
filesystemio_options=setall

The following v$sgainfo output justifies just how closely configured the AMM and MMM cases were.

AMM:

SQL> select * from v$sgainfo ;

NAME                                  BYTES RES
-------------------------------- ---------- ---
Fixed SGA Size                      1298916 No
Redo Buffers                       11943936 No
Buffer Cache Size                 654311424 Yes
Shared Pool Size                  234881024 Yes
Large Pool Size                    16777216 Yes
Java Pool Size                     16777216 Yes
Streams Pool Size                         0 Yes
Shared IO Pool Size                33554432 Yes
Granule Size                       16777216 No
Maximum SGA Size                 1573527552 No
Startup overhead in Shared Pool    83886080 No

NAME                                  BYTES RES
-------------------------------- ---------- ---
Free SGA Memory Available                 0

MMM:

SQL> select * from v$sgainfo ;

NAME                                  BYTES RES
-------------------------------- ---------- ---
Fixed SGA Size                      1302592 No
Redo Buffers                        4964352 No
Buffer Cache Size                 654311424 Yes
Shared Pool Size                  234881024 Yes
Large Pool Size                           0 Yes
Java Pool Size                     25165824 Yes
Streams Pool Size                         0 Yes
Shared IO Pool Size                29360128 Yes
Granule Size                        4194304 No
Maximum SGA Size                  949989376 No
Startup overhead in Shared Pool    75497472 No

NAME                                  BYTES RES
-------------------------------- ---------- ---
Free SGA Memory Available                 0

The server was a HP DL380 with 4 processor cores and the storage was an HP EFS Clustered Gateway NAS. Before each test I did the following:

Restore Database
Reboot Server
Mount NFS filesystems
Boot Oracle

Before the MMM case I set vm.nr_hugepages=600 and after the database was booted, hugepages utilization looked like this:

$ grep Huge /proc/meminfo
HugePages_Total:   600
HugePages_Free:    145
Hugepagesize:     2048 kB

So, given all these conditions, I believe I am making an apples-apples comparison of AMM to MMM where AMM does not get hugepages support but MMM does. I think this is a pretty stressful workload since I am maxing out the processors and performing a significant amount of I/O-given the size of the server.

Test Results
OK, so this is a very contained case and Oracle Database 11g is still only available on x86 Linux. I hope I can have the time to do a similar test with more substantial gear. For the time being, what I know is that losing hugepages support for the sake of gaining AMM should not make you lose sleep. The results measured in throughput (transactions per second) and server statistics are in:

Configuration	OLTP Transactions/sec	Logical IO/sec	Block Changes/sec	Physical Read/sec	Physical Write/sec
AMM	905	36,742	10,195	4,287	2,817
MMM	872	36,411	10,101	4,864	2,928

Looks like 4% in the favor of AMM to me and that is likely attributed to the 13% more physical I/O per transaction the MMM case had to perform. That part of the results has me baffled for the moment since they both have the same buffering as the v$sgainfo output above shows. Well, yes, there is a significant difference in the amount of Large Pool in the MMM case, but this workload really shouldn’t have any demand on Large Pool. I’m going to investigate that further. Perhaps an interesting test would be to reduce the amount buffering the AMM case gets to force more physical I/O. That could bring it more in line. We’ll see.

Summary
I’m not saying hugepages is no help across the board. What I am saying is that I would weigh heavily the benefits AMM offers because losing hugepages might not make any difference for you at all. If it is, in fact, a huge problem across the board then it looks like there has been work done in this area for the 2.6 Kernel and it seems reasonable that such a feature (hugepages support for mmap(P)) could be implemented. We’ll see.

19 Responses to “Oracle11g Automatic Memory Management – Part I. Linux Hugepages Support.”

Feed for this Entry Trackback Address

1 Christian Antognini August 23, 2007 at 10:30 pm

Hi Kevin

Out of curiosity… Why did you set compatible to 10.1.0.0?

Take care,
Chris

Reply
2 kevinclosson August 23, 2007 at 10:48 pm

Christian,

Good question…too broad a stroke with the cut and paste I guess…would you wager a guess that it would change OLTP performance?

Reply
3 Howard Rogers August 24, 2007 at 12:36 am

Right. Now fire off a workload which doesn’t know what a bind variable is but which needs to do a lot of large repeated queries differing only by the value of one or two variables.

In 10g, at any rate, AMM will start shovelling memory by the bucketload to the Shared Pool to deal with a near 100% miss ratio on the library cache, even though manually you know giving more memory to the buffer cache is the ‘right’ thing to do because the queries all hit one large table which could be cached quite effectively…

Yup, I know I should re-write the SQL so it uses bind variables. Still, given that I can’t, AMM is seriously bad news.

None of which refutes (or is intended to refute) one jot of what you wrote. Just that, AMM is seriously iffy unless you are right under the hump of the normal bell curve. Get even slightly weird and AMM is almost certainly the last thing you want working “for” you.

Reply
4 kevinclosson August 24, 2007 at 12:46 am

Howard,

Good feedback. I intend to torture it a bit more. I’m looking for good things beyond what 10g managed to offer… fingers crossed as they say.

Reply
5 jason arneil August 24, 2007 at 7:54 am

Interesting as always, but do you have any feeling in how those results would vary with increasing size of SGA ? 624M seems a bit tiddly, if you are throwing 16, or 32 GB at your SGA, things may swing back in favour of using hugepages, no?

jason.

Reply
6 Tim Hall August 24, 2007 at 8:24 am

Howard:

You could still use AMM if you use “CURSOR_SHARING=[similar|force]” to get round your bind variable problem. I know there are issues associated with CURSOR_SHARING, but I’ve been forced to use it for a couple of 3rd party apps and it’s worked a treat.

Cheers

Tim…

Reply
7 Christian Antognini August 24, 2007 at 10:31 am

Kevin

> would you wager a guess that it would change OLTP performance?

Honestly I don’t know. From one side I never really played with it. From the other side Oracle provides almost no information about the effects of COMPATIBLE. Some features are activated, some others not. For example you used MEMORY_TARGET… But, what about other performance features (like mutexes) introduced in later versions?

Therefore to do a performance comparison I would not specify it because the test may be biased. And since few people run databases with COMPATIBLE set… One may argue that is due to that.

Best,
Chris

Reply
8 kevinclosson August 24, 2007 at 4:22 pm

Jason: This is 32bit Oracle. I can go larger but constrained by address space.

Tim: Thanks for the info

Chris: I’ll give it a purified whirl. No problem.

Reply
9 Herbert May 30, 2008 at 5:43 am

As you mentioned when you talked about the Sequent system with 1024 shadow processes connecting to a 1GB SGA using another 1GB of pagetable memory, hugepages have huge advantages with huge numbers of users. Think huge! I suspect that you didn’t max out the processes=200 parameter during your test, so you may not have noticed the advantage. Despite wasting 300MB of memory on unused hugepages.

Reply
10 Gabe December 2, 2010 at 8:05 pm

I know this is an old post but …

“For instance, you can’t use hugepages on SuSE unless you turn off vm.disable_cap_mlock”

Do you mean vm.disable_cap_mlock=0? Does it depend on oracle version? SuSE version?

With SuSE 9.3 and 9.2.0.8 and 10.2.0.4 we are running vm.disable_cap_mlock=1 and huge pages. I am wondering if you comment applies to a specific version or am I missing something else? Thanks.

Reply
- 11 kevinclosson December 2, 2010 at 8:12 pm
  
  Hi Gabe,
  
  Are you sure you’re getting hugepages in use? If so, then the information I blogged would be relevant to versions older than SuSE 9. I haven’t touched a SuSE box in nearly 4 years. Oracle MOS notes are pretty clean on disable_cap_mlock so I’d recomment tolling through MOS on the matter for up-to-date stuff.
  
  Reply
  - 12 Gabe December 2, 2010 at 10:24 pm
    
    cat /etc/SuSE-release
    SUSE LINUX Enterprise Server 9 (x86_64)
    VERSION = 9
    PATCHLEVEL = 3
    
    >cat /proc/sys/vm/nr_hugepages
    2700
    
    ~>cat /proc/sys/vm/disable_cap_mlock
    1
    
    >grep Huge /proc/meminfo
    HugePages_Total: 2700
    HugePages_Free: 829
    Hugepagesize: 2048 kB
    
    The used huge pages match SGA. This is from a server with one DB. (I know it was really an overkill for this DB to setup huge pages.)
    
    365607.1 lists disable_cap_mlock=1 as a requirement for 10g & SLES9 (not in relation to hugepages, just a general requirement). And there is a similar note for 9i and SLES9.
    
    So, I am guessing you referred to an older version of SuSE (?)
    
    Reply
    - 13 kevinclosson December 3, 2010 at 12:57 am
      
      Gabe,
      
      To be honest, I cannot remember. That was quite some time ago and relevant at the time. Sorry if I’ve caused confusion.
      
      Reply
14 shovon June 1, 2012 at 3:28 am

it’s so helpful..In my case i have 64 GB ram. So i wanna go for MMM. So how should I configure my kernel parameters and everything?

Reply

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage