Archive Page 3

Expert Oracle Exadata (Apress) Translated Into Chinese

 

Kerry Osborne has a short post to point out that the last book I worked on has been translated into Chinese.  The photo of the book cover looks pretty cool…if only I could read anything except for our names and a few stray English words :-)

This book is a must-read for anyone that wants to cut beyond the hype and actually learn Exadata.

 

Fault Injection Testing. Spurious Space Depletion? Sure, Why Not?

When file systems run out of space bad things happen. We like to investigate what those “bad things” are but to do so we have to create artificially small installation directories and run CPU-intensive programs to deplete the remaining space. There is a better way on modern Linux systems.

If you should find yourself performing Linux platform fault-injection testing you might care to add spurious space free failures. The fallocate() routine immediately allocates the specified amount of file system space to an open file.  It might be interesting to inject random space depletion in such areas as Oracle Clusterware (Grid Infrastructure) installation directories or application logging directories. Could a node ejection occur if all file system space immediately disappeared? What would that look like on the survivors? What happens if large swaths of space disappear and reappear? Be creative with your destructive tendencies and find out!

 


#include <asm/unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>


int main(int argc, char *argv[])
{
long int sz;
char *fname;
int ret,fd;

if (argc != 3)
{
fprintf(stderr, "usage: %s file new-size-in-gigabytes\n", argv[0]);
return(-1);
}

fname = argv[1];
sz   = atol(argv[2]);

if ((ret = (fd = open(fname, O_RDWR | O_CREAT | O_EXCL, 0666)))  == -1 ) {
perror("open");
return(ret);

}
if ( (ret = fallocate( fd, 0, (loff_t)0, (loff_t)sz * 1024 * 1024 * 1024 )) != 0 ){
perror ("fallocate");
unlink( fname );
}

close(fd);
return ret;
}

 

 

 

#
# cc fast_alloc.c
#
# ./a.out
usage: ./a.out file new-size-in-gigabytes
#
# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sdc 2.7T 1.6T 1.2T 57% /data1
#
# time ./a.out bigfile 512
real 0m1.875s
user 0m0.000s
sys 0m0.730s
# du -h bigfile
513G bigfile
# rm -f bigfile
#
# ./a.out bigfile 512
# ls -l bigfile
-rw-r--r-- 1 root root 549755813888 Jul 1 09:48 bigfile

Putting SLOB (The Silly Little Oracle Benchmark) To Use For Knowledge Sake!

This is just a short blog entry here to refer folks interested in SLOB to the following links:

About SLOB:  Introducing SLOB – The Silly Little Oracle Benchmark

Introducing A LinkedIn Group For SLOB Users

This is just a very short blog entry to inform folks that there is an open discussion group over at LinkedIn for SLOB topics of interest.

The group can be accessed through the following link:  SLOB LinkedIn Group.

Simple SLOB Init.ora Parameter File For Read IOPS Testing

This is just a quick blog entry to show the very simple init.ora parameter file I use to stress simple read IOPS testing with SLOB.  On 2s16c32t E5-2600 servers attached to very fast storage this init.ora parameter delivers on the order of 275,000 physical IOPS with 64 SLOB sessions.

I’ll post an init.ora that I use for the REDO model and DBWR testing as soon as possible.

Thanks to Yury for the recommended hidden init.ora parameters to boost the ratio of db file sequential reads.

Additional information can be found here: README file.

Here is the init.ora:

db_create_file_dest = '/mnt/dsk/slob'
control_files=('/mnt/dsk/slob/cntlSLOB.dbf')
db_name = SLOB
compatible = 11.2.0.2
UNDO_MANAGEMENT=AUTO
db_block_size = 8192
db_files = 20000
processes = 500
shared_pool_size = 5000M
db_cache_size=10M
filesystemio_options=setall
parallel_max_servers=0
_db_block_prefetch_limit=0
_db_block_prefetch_quota=0
_db_file_noncontig_mblock_read_count=0
log_buffer=134217728
cpu_count=1
pga_aggregate_target=8G

SLOB Is Not An Unrealistic Platform Performance Measurement Tool – Part II. If It’s Different It’s Not The Same.

In Part I of this series I discussed the concept of “Mindless I/O” and contrasted Orion I/O testing to SLOB.

Some time has passed since that post and a lot of folks have been studying their systems using SLOB. Others have been motivated to prove that the differences between SLOB and Orion are moot. I’m OK with that because I use both of them and a large array of other I/O tools (fio, bonnie, etc,etc).

Test Single Block Random Reads With SLOB And Call It A Day
There is something I’d like to quickly point out about SLOB.

SLOB has 4 models for testing. Since it is a real SGA it can be used for testing the platform’s ability to handle SGA-cached workloads. Think about it. If your platform can’t scale up cached reads you’ve got a problem. This is what a lot of us SLOB users refer to as Logical I/O testing where the metric is LIOPS (logical I/O per second). Another model is, of course, the simple random read model. There are also two models that stress genuine DBWR and LGWR functionality and those models cannot be mimicked accurately, or otherwise, with Orion. Orion is mindless I/O. That fact matters to me and perhaps will matter to some of you as well.

Click here for more on the different SLOB models.

Cool Idea: Let’s use Asynchronous I/O Libraries To Determine Platform Fitness For High-Rate Synchronous I/O
Another thing to be aware of with Orion (as I pointed out 6 years ago) is the fact that Orion actually uses the wrong library for performing random single block reads (db file sequential read).  Yes, it uses VOS routines (skgf*), but the wrong ones–at least for the random single block read tests. The following screen shot shows Orion attempting to measure platform suitability for Oracle single block random reads (the db file sequential read model). Oracle instances do not use io_submit()/io_getevents() for db file sequential read:

A real Oracle process (such as a SLOB session) uses synchronous I/O (specifically libC pread()) to service db file sequential read. Yes, I know this varies by file storage type, dNFS, Exadata and other variations but I’m on blisteringly fast storage accessed via direct I/O through an XFS file system–a perfectly supported storage provisioning approach. Nonetheless, in all cases Oracle instances perform single-block I/O when performing single-block blocking I/O–not some FrankenSynchronousIO(tm) such as a single-block request through the asynchronous API followed by a group-reaping of outstanding requests as Orion is doing in the above screenshot. The screen shot shows a single Linux process (Orion) reaping 128 previously-submitted asynchronous I/Os whilst aiming to simulate db file sequential read.

So, if you happen to be testing modern, non-mechanical storage at a rate of, say, 20,000 IOPS/core you are missing out on the joy of analyzing the associated cost of 20,000 context switches/sec. Most people think a system call is a context switch–it isn’t. A context switch is the stopping of a process, saving its state, executing the scheduler code and switching to the context of another process. When a scheduler gets to make 20,000 affinity (NUMA, L2 cache, etc) decisions per second you get the opportunity of learning whether you like the choices it is making. I prefer to know as opposed to blind faith in the OS. A platform is more than the sum of all it’s components. It needs to all come together.

Another problem with using libaio in place of synchronous pread() is the fact that I/Os submitted in the asynchronous path are flat-out handled differently–and different is not the same as same. This is the case on all operating systems I know of. When requests are submitted through the asynchronous interfaces there are opportunities for the OS to potentially optimize the strategy for servicing the I/O. After all, the I/Os have been submitted from a single process context whereas the same number of I/O requests funneling through the synchronous interface means associating the request with a potentially vast numbers of discrete processes. It’s just different, not “slightly less-same”, but different. And different matters–at least to me.

It’s Just Different–And That’s All That Matters
There are some folks in the blogosphere putting in good SLOB testing. Some good folks at Pythian are doing the heavy lifting of proving to you that SLOB and Orion are on par or even, perhaps, Orion is somehow better for testing random reads. I should point out that random reads testing is just a miniscule portion of platform performance analysis. The fact that SLOB and Orion are different is all that matters to me. I need more tools than just a hammer. And when I need a hammer I don’t what a screwdriver that feels serendipitously hammer-like.

Once again I want to thank Yury for testing SLOB. Even if I don’t agree with his conclusions I very much appreciate his work with the kit. Healthy disagreements are healthy. We cultivate these threads and we all learn (I presume).

Different In So Many Ways
So if you are reading this, and are about to begin SLOB testing please be mindful of the following “feature overview” vis a vis SLOB:

  • SLOB is an Oracle instance. It doesn’t toil along trying to behave like an instance nor challenge your faith in accepting the differences as unimportant.
  • SLOB is not mindless I/O. You get the chance to learn more about your platform when the CPUs are actually doing something. I/O is not no longer a probelm with modern platforms. It’s all about CPU–always has been but obscured by artificial storage plumbing problems.
  • SLOB uses the right routines. If you want to study the fitness for a given platform to handle high-rate db file sequential read you might as well do db file sequential read–or at least use a tool that calls the proper system calls. Libaio and LibC share letters of the alphabet but they are really different–and different in this specific regard matters. Actaully, different always matter to me.
  • SLOB will allow you to test / study DBWR and LGWR. If you can get Orion to aid you in the study of log file sync or DBWR flushing capacity please drop me a note  because I missed the memo :-) .

About Part III
I discussed “Mindless I/O” in Part I. I needed to inject this installment to point out the little bits about Orion using libaio when you want to test synchronous reads. When I get to Part III I’ll get back to the Mindless I/O topic.

SLOB Is Not An Unrealistic Platform Performance Measurement Tool – Part I. Let’s See If That Matters…To Anyone.

I just checked to find out that there has been 3,000 downloads of SLOB – The Silly Little Benchmark. People seem to be putting it to good use. That’s good.

Before I get very far in this post I’d like to take us back in time–back before the smashing popularity of the Orion I/O testing tool.

When Orion first appeared on the scene there was a general reluctance to adopt it. I suspect some of the reluctance stemmed from the fact that folks had built up their reliance on other tools like bonnie, LMbench, vxbench and other such generic I/O generators. Back in the 2006 (or so) time frame I routinely pointed out that no tool other than Orion used the VOS layer Oracle I/O routines and libraries. It’s important to test as much of the real thing as possible.

Who wants to rely on an unrealistic platform performance measurement tool after all?

My “List”
Over time I built a list of reasons I could no longer accept Orion as sufficient for platform I/O testing. Please note, I just wrote “platform I/O testing” not “I/O subsystem testing.”  I think the rest of this post will make the distinction between these two quoted phrases quite clear. The following is a short version of the list:

  • Orion does not simulate Oracle processing in any way, shape or form. More on that as this blog series matures.
  • Orion is what I refer to as mindless I/O. More on that as this blog series matures.
  • Orion is useless in assessing a platform’s capability to handle modify-intensive DML (thus REDO processing, LGWR and DBWR, etc). More on that as this blog series matures.

My present-tense views on Orion sometimes surface on twitter where I am occasionally met with vigorous disagreement–most notably from my friend Alex Gorbachev. Alex is a friend, co-member of the Oaktable Network, CTO of Pythian (I love those Pythian folks), and someone who generally disagrees with most everything I say.

I respect Alex, because he has vast knowledge and valuable skills. His arguments make me think. That’s a good thing. I’m not sure, however, our respective spheres of expertise overlap.

So how do these disagreements regarding SLOB get started? Recently I tweeted:

The difference between SLOB and Orion is akin to Elliptical trainer versus skiing on the side of a mountain.

Alex replied with:

I could just as well argue that SLOB is useless because that’s not real workload anyway and you should test with your app

This quick exchange of ideas set into motion some Pythian testing by Yury. As it turns out I think the goal of that test was to prove parity between SLOB and Orion for random reads–and perhaps not much more.  If only I have published “My List” above before then.

Yury’s tests were good, albeit, exceedingly small in scope. His blog post suggests more testing on the way. That is good. If you read the comment thread on his blog entry you’ll see where I thank Yury for a good tweak to the SLOB kit that eliminates the db file parallel reads associated with the index range scans incurred by SLOB reader processes. Come to think of it though, Matt from Violin Memory pointed that one out to me some time back. Hmm, oh well, I digress. The modifications Yury detailed (init.ora parameters) will be included in the next drop of the SLOB kit. Again, thanks Yury for the testing and the init.ora parameter change recommendations!

Feel free to see Yury’s findings. They are simple: SLOB and Orion do the same thing. Really, SLOB and Orion do the same thing? Well, that may be the case so long as a) you compare SLOB to Orion only for simple random read testing and/or b) your testing is limited to a little, itsy-bitsy, teeny, tiny, teensy, minute, miniscule, meager, puny, Lilliputian-grade undersized I/O subsystem incapable of producing reasonable, modern-scale IOPS.  Yury’s experiment topped out at roughly 4,500 random read IOPS.  I’ll try to convince you that there is more to it than that (hint, modern servers are fit for IOPS in the 20,000/core range). But first, I have two quotable quotes to offer at this point:

When assessing the validity of an I/O testing tool, do so on a system that isn’t badly bottlenecked on storage.

If your application (e.g., Oracle Database)  is “mindless” use a “mindless” I/O generator–if not, don’t.

Mindless I/O
So what do I mean when I say “mindless I/O?”  The answer to that is simple. If the code performs an I/O into a memory buffer, without any application concurrency overhead, and no processes even so much as peeks at a single byte of that buffer populated through DMA from the I/O adapter device driver–it’s mindless. That is exactly how Orion does what it does. That’s what every other synthetic I/O generator I know of does as well.

So what does mindless I/O look like and why does it show up on my personal radar as a problem? Let’s take a look–but first let me just say one thing–I analyze I/O characteristics on extremely I/O capable platforms. Extremely capable.

The following screen shot shows a dd(1) command performing mindless I/O by copying an Oracle OMF datafile from an XFS file system to /dev/null using direct I/O. After that another dd(1) was used to show the difference between “mindless” and meaningful I/O. The second dd(1) was meaningful because after each 1 MB read the buffer is scanned looking for lower case ASCII chars to convert to their upper-case counterpart. That is, the second dd(1) did data processing–not just a mindless tickling of the I/O subsystem.

The mindless I/O was 2.5 GB/s but the meaningful case fell to about 1/6th that at 399 MB/s. See, CPU matters. It matters in I/O testing. CPU throttles I/O–unless you are interested in mindless I/O. What does this have to do with Orion and SLOB? A moment ago I mentioned that I test very formidable I/O subsystems commensurate with modern platforms–so hold on to your hat while I tie these trains of thought together.

Building on my dd(1) example of mindless I/O, I’ll offer the following screen shot which shows Orion accessing the same OMF SLOB datafile (also via direct I/O validated with strace). Notice how I force all the threads of Orion (it’s threaded with libpthreads) to OS CPU 0 using numactl(8) on this 2s12c24t Xeon 5600 server?  What you are about to see is the single-core capacity of Orion to perform “mindless I/O”:

Unrealistic Platform Performance Measurement Tools
This is only Part I in this series.  I’ll be going through a lot of proof points to solidify backing for my Orion-related assertions in the list above, but please humor me for a moment. I’d like to know just how realistic are platform performance measurements from an I/O tool that demonstrates capacity for 144,339 physical 8K random IOPS while pinned to a single core of a Xeon 5600 processor?

We are interested in database platform IOPS capacity, right?

Through this blog series I aim to help you conclude that any tool demonstrating such an unrealistic platform performance measurement is, well, an unrealistic platform performance measurement tool.

Do you feel comfortable relying on an unrealistic platform performance measurement tool? Before I crafted SLOB I too accepted test results from unrealistic platform performance measurement tools but I learned that I needed to include the rest of the platform (e.g., CPU, bus, etc) when I’m studying platform performance so I left behind unrealistic platform performance measurement tools.

Until recently I didn’t spend any time discussing measurements taken from unrealistic platform performance measurement tools. However, since friends and others in social media are pitting unrealistic platform performance measurement tools against SLOB (not an unrealistic platform performance measurement tool) such comparisons are blog-worthy. Hence, I’ll trudge forward blogging about how unrealistic certain unrealistic platform performance measurement tools are. And, if you stay with me on the series, you might discover some things you don’t know because, perhaps, you’ve been relying on unrealistic platform performance measurement tools.

As this series evolves, I’ll be sharing several similar unrealistic platform performance measurement tool results as I go though the list above. That is, of course, what motivated me to leave behind unrealistic platform performance measurement tools.

Final Words For This Installment
In Yury’s post he quoted me as having said:

It’s VERY easy to get huge Orion nums

His assessment of that quote was, “kind of FALSE on this occasion.”  Having now shown what I mean by “VERY easy” (e.g., even a single core can drive massive Orion IOPS) and “huge Orion” numbers  (e.g., 144K IOPS), I wonder whether Yury will be convinced about my assertions regarding unrealistic platform performance measurement tools? If not yet, perhaps he, and other readers will eventually. After all, this is only Part I. If not, Yury, I still want to say, “thanks for testing with SLOB and please keep the feedback coming.”

Alex and I may always disagree :-)

Oh, by the way folks, if all you have is Orion, use it. It is better than wild guesses–at least a little better.

Link to Part II of this series.


EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,141 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 1,141 other followers