SLOB Is Not An Unrealistic Platform Performance Measurement Tool – Part I. Let’s See If That Matters…To Anyone.

I just checked to find out that there has been 3,000 downloads of SLOB – The Silly Little Benchmark. People seem to be putting it to good use. That’s good.

Before I get very far in this post I’d like to take us back in time–back before the smashing popularity of the Orion I/O testing tool.

When Orion first appeared on the scene there was a general reluctance to adopt it. I suspect some of the reluctance stemmed from the fact that folks had built up their reliance on other tools like bonnie, LMbench, vxbench and other such generic I/O generators. Back in the 2006 (or so) time frame I routinely pointed out that no tool other than Orion used the VOS layer Oracle I/O routines and libraries. It’s important to test as much of the real thing as possible.

Who wants to rely on an unrealistic platform performance measurement tool after all?

My “List”
Over time I built a list of reasons I could no longer accept Orion as sufficient for platform I/O testing. Please note, I just wrote “platform I/O testing” not “I/O subsystem testing.”  I think the rest of this post will make the distinction between these two quoted phrases quite clear. The following is a short version of the list:

  • Orion does not simulate Oracle processing in any way, shape or form. More on that as this blog series matures.
  • Orion is what I refer to as mindless I/O. More on that as this blog series matures.
  • Orion is useless in assessing a platform’s capability to handle modify-intensive DML (thus REDO processing, LGWR and DBWR, etc). More on that as this blog series matures.

My present-tense views on Orion sometimes surface on twitter where I am occasionally met with vigorous disagreement–most notably from my friend Alex Gorbachev. Alex is a friend, co-member of the Oaktable Network, CTO of Pythian (I love those Pythian folks), and someone who generally disagrees with most everything I say.

I respect Alex, because he has vast knowledge and valuable skills. His arguments make me think. That’s a good thing. I’m not sure, however, our respective spheres of expertise overlap.

So how do these disagreements regarding SLOB get started? Recently I tweeted:

The difference between SLOB and Orion is akin to Elliptical trainer versus skiing on the side of a mountain.

Alex replied with:

I could just as well argue that SLOB is useless because that’s not real workload anyway and you should test with your app

This quick exchange of ideas set into motion some Pythian testing by Yury. As it turns out I think the goal of that test was to prove parity between SLOB and Orion for random reads–and perhaps not much more.  If only I have published “My List” above before then.

Yury’s tests were good, albeit, exceedingly small in scope. His blog post suggests more testing on the way. That is good. If you read the comment thread on his blog entry you’ll see where I thank Yury for a good tweak to the SLOB kit that eliminates the db file parallel reads associated with the index range scans incurred by SLOB reader processes. Come to think of it though, Matt from Violin Memory pointed that one out to me some time back. Hmm, oh well, I digress. The modifications Yury detailed (init.ora parameters) will be included in the next drop of the SLOB kit. Again, thanks Yury for the testing and the init.ora parameter change recommendations!

Feel free to see Yury’s findings. They are simple: SLOB and Orion do the same thing. Really, SLOB and Orion do the same thing? Well, that may be the case so long as a) you compare SLOB to Orion only for simple random read testing and/or b) your testing is limited to a little, itsy-bitsy, teeny, tiny, teensy, minute, miniscule, meager, puny, Lilliputian-grade undersized I/O subsystem incapable of producing reasonable, modern-scale IOPS.  Yury’s experiment topped out at roughly 4,500 random read IOPS.  I’ll try to convince you that there is more to it than that (hint, modern servers are fit for IOPS in the 20,000/core range). But first, I have two quotable quotes to offer at this point:

When assessing the validity of an I/O testing tool, do so on a system that isn’t badly bottlenecked on storage.

If your application (e.g., Oracle Database)  is “mindless” use a “mindless” I/O generator–if not, don’t.

Mindless I/O
So what do I mean when I say “mindless I/O?”  The answer to that is simple. If the code performs an I/O into a memory buffer, without any application concurrency overhead, and no processes even so much as peeks at a single byte of that buffer populated through DMA from the I/O adapter device driver–it’s mindless. That is exactly how Orion does what it does. That’s what every other synthetic I/O generator I know of does as well.

So what does mindless I/O look like and why does it show up on my personal radar as a problem? Let’s take a look–but first let me just say one thing–I analyze I/O characteristics on extremely I/O capable platforms. Extremely capable.

The following screen shot shows a dd(1) command performing mindless I/O by copying an Oracle OMF datafile from an XFS file system to /dev/null using direct I/O. After that another dd(1) was used to show the difference between “mindless” and meaningful I/O. The second dd(1) was meaningful because after each 1 MB read the buffer is scanned looking for lower case ASCII chars to convert to their upper-case counterpart. That is, the second dd(1) did data processing–not just a mindless tickling of the I/O subsystem.

The mindless I/O was 2.5 GB/s but the meaningful case fell to about 1/6th that at 399 MB/s. See, CPU matters. It matters in I/O testing. CPU throttles I/O–unless you are interested in mindless I/O. What does this have to do with Orion and SLOB? A moment ago I mentioned that I test very formidable I/O subsystems commensurate with modern platforms–so hold on to your hat while I tie these trains of thought together.

Building on my dd(1) example of mindless I/O, I’ll offer the following screen shot which shows Orion accessing the same OMF SLOB datafile (also via direct I/O validated with strace). Notice how I force all the threads of Orion (it’s threaded with libpthreads) to OS CPU 0 using numactl(8) on this 2s12c24t Xeon 5600 server?  What you are about to see is the single-core capacity of Orion to perform “mindless I/O”:

Unrealistic Platform Performance Measurement Tools
This is only Part I in this series.  I’ll be going through a lot of proof points to solidify backing for my Orion-related assertions in the list above, but please humor me for a moment. I’d like to know just how realistic are platform performance measurements from an I/O tool that demonstrates capacity for 144,339 physical 8K random IOPS while pinned to a single core of a Xeon 5600 processor?

We are interested in database platform IOPS capacity, right?

Through this blog series I aim to help you conclude that any tool demonstrating such an unrealistic platform performance measurement is, well, an unrealistic platform performance measurement tool.

Do you feel comfortable relying on an unrealistic platform performance measurement tool? Before I crafted SLOB I too accepted test results from unrealistic platform performance measurement tools but I learned that I needed to include the rest of the platform (e.g., CPU, bus, etc) when I’m studying platform performance so I left behind unrealistic platform performance measurement tools.

Until recently I didn’t spend any time discussing measurements taken from unrealistic platform performance measurement tools. However, since friends and others in social media are pitting unrealistic platform performance measurement tools against SLOB (not an unrealistic platform performance measurement tool) such comparisons are blog-worthy. Hence, I’ll trudge forward blogging about how unrealistic certain unrealistic platform performance measurement tools are. And, if you stay with me on the series, you might discover some things you don’t know because, perhaps, you’ve been relying on unrealistic platform performance measurement tools.

As this series evolves, I’ll be sharing several similar unrealistic platform performance measurement tool results as I go though the list above. That is, of course, what motivated me to leave behind unrealistic platform performance measurement tools.

Final Words For This Installment
In Yury’s post he quoted me as having said:

It’s VERY easy to get huge Orion nums

His assessment of that quote was, “kind of FALSE on this occasion.”  Having now shown what I mean by “VERY easy” (e.g., even a single core can drive massive Orion IOPS) and “huge Orion” numbers  (e.g., 144K IOPS), I wonder whether Yury will be convinced about my assertions regarding unrealistic platform performance measurement tools? If not yet, perhaps he, and other readers will eventually. After all, this is only Part I. If not, Yury, I still want to say, “thanks for testing with SLOB and please keep the feedback coming.”

Alex and I may always disagree :-)

Oh, by the way folks, if all you have is Orion, use it. It is better than wild guesses–at least a little better.

Quick Reference README File For SLOB – The Silly Little Oracle Benchmark

This is just a quick blog entry with the main README file from SLOB – The Silly Little Oracle Benchmark. I frequently findings myself referring folks to the README so I thought I’d make it convenient. I’ve also uploaded this in PDF form here.

            SLOB - Silly Little Oracle Benchmark

INDEX
    INTRO
    NOTE ABOUT SMALL SGA
    SETUP STEPS
    RELOADING THE TABLES
    RESULTS
    TERMINOLOGY
    HOW MANY PROCESSES DO I RUN
    NON-LINUX PLATFORMS    

INTRO
-----
This kit does physical I/O. Lot's of it. 

The general idea is that schema users connect to the instance and 
execute SQL on their own tables and indexes so as to eliminate 
as much SGA *application* sharing as possible. SLOB aims to stress Oracle 
internal concurrency as opposed to application-contention. It's all about 
database physical IO ( both physical and logical) not application scaling.

The default kit presumes the existence of a tablespace called IOPS. If 
you wish to  supply another named tablespace it will be given as a 
argument to the setup.sh script. More on this later in this README.

To create the schemas and load data simply execute setup.sh as the Oracle 
sysdba user. The setup.sh script takes two arguments the first being the 
name of the tablespace and the second being how many schema users to load. 
A high-end test setup will generally load 128 users. To that end, 128 is 
the default.

To run the test workload use the runit.sh script. It takes two arguments 
the first being the number of sessions that will attach and perform modify 
DML (UPDATE) on their data (writer.sql) and the second directs how many sessions 
will connect and SELECT against their data (reader.sql). 

NOTE ABOUT SMALL SGA
--------------------
The key to this kit is to run with a small SGA buffer pool to force physical 
I/O. For instance, a 40MB SGA will be certain to result in significant physical 
IOPS when running with about 4 or more reader sessions. Monitor free buffer waits 
and increase db_cache_size to ensure the run proceeds without free buffer wait 
events.

Oracle SGA sizing heuristics may prevent you from creating a very small SGA
if your system has a lot of processor cores. There are remedies for this. 
You can set cpu_count in the parameter file to a small number (e.g., 2) and this
generally allows one to minimize db_block_buffers. Another approach is
to create a recycle buffer pool. The setup.sh script uses the storage 
clause of the CREATE TABLE command to associate all SLOB users' tables
with a recycle pool. If there happens to be a recycle pool when the 
instance is started then all table traffic will flow through that
pool. 

SETUP STEPS
-----------
1. First, create the trigger tools. Change directory to  ./wait_kit 
   and execute "make all"
2. Next, execute the setup.sh script, e.g., sh ./setup.sh IOPS 128
3. Next, run the kit such as sh ./runit.sh 0 8

RELOADING THE TABLES
--------------------
When setup.sh executes it produces a drop_users.sql file. If you need to 
re-run setup.sh it is optimal to execute drop_users.sql first and then 
proceed to re-execute setup.sh.

RESULTS
-------
The kit will produce a text awr report named awr.txt. The "awr" directory 
scripts can be modified to produce a HTML awr report if so desired. 

TERMINOLOGY
-----------
SLOB is useful for the following I/O and system bandwidth testing:

1. Physical I/O (PIO) - Datafile focus
    1.1 This style of SLOB testing requires a small db_block_cache
    setting. Small means very small such as 40MB. Some
    users find that it is necessary to over-ride Oracle's built
    in self-tuning even when supplying a specific value to 
    db_cache_size. If you set db_cache_size small (e.g., 40M)
    but SHOW SGA reveals an over-ride situation, consider 
    setting cpu_count to a very low value such as 2. This will
    not spoil SLOB's ability to stress I/O.
    1.2 Some examples of PIO include the following:
        $ sh ./runit.sh 0 32   # zero writers 32 readers
        $ sh ./runit.sh 32 0   # 32 writers zero readers
        $ sh ./runit.sh 16 16  # 16 of each reader/writer
2. Logical I/O (LIO)
    2.1 LIO is a system bandwidth and memory latency test. This 
    requires a larger db_block_cache setting. The idea is to 
    eliminate Physical I/O. The measurement in this testing mode 
    is Logical I/O as reported in AWR as Logical reads.
3. Redo Focused (REDO)
    3.1 REDO mode also requires a large SGA. The idea is to 
    have enough buffers so that Oracle does not need to
    activate DBWR to flush. Instead, LGWR will be the
    only process on the system issuing physical I/O. This 
    manner of SLOB testing will prove out the maximum theoretical
    redo subsystem bandwidth on the system. In this mode
    it is best to run with zero readers and all writers.

HOW MANY PROCESSES DO I RUN?
----------------------------
I recommend starting out small and scaling up. So, for instance,
a loop of PIO such as the following:
    $ for cnt in 1 2 4 8
    do
        sh ./runit.sh 0 $cnt
    done

Take care to preserve the AWR report in each iteration of the loop.
The best recipe for the number of SLOB sessions is system specific. 
If your system renders, say, 50,000 PIOPS with 24 readers but starts
to tail beyond 24 then stay with 24.

In general I recommend thinking in terms of SLOB sessions per core.

In the LIO case it is quite rare to run with more readers.sql than the
number of cores (or threads in the case of threaded cores). On the other 
hand, in the case of REDO it might take more than the number of cores 
to find the maximum redo subsystem throughput--remember, Oracle does 
piggy-back commits so over-subscribing sessions to cores might be 
beneficial during REDO testing.

NON-LINUX PLATFORMS
-------------------
The SLOB install directory has of README.{PLATFORM} files and 
user-contributed, tested scripts under the ./misc/user-contrib directory.

Oracle’s Timeline, Copious Benchmarks And Internal Deployments Prove Exadata Is The Worlds First (Best?) OLTP Machine – Part II

There Is No Such Thing As “Pure OLTP”
There is no such thing as “pure OLTP.” How true! And that’s why you are supposed to buy Exadata for your Oracle OLTP/ERP deployment—at least that’s what I’ve heard.

Part I of this series on the topic of Oracle OLTP/ERP on Exadata Database Machine has brought quite a bit of feedback my way.  Most of the feedback came from independent consultants who have built a practice around Exadata. I did, however, hear from an Oracle customer that has chosen to migrate their Oracle Database 10g ERP system from a cluster of old AMD 2300 “Barcelona” Opteron-based servers (attached to a circa 2007 technology SAN) to Exadata. This customer also cited the fact that there is no such thing as “pure OLTP” and since it is a fact I don’t refute it.

No Such Thing As “Pure OLTP” – What Does That Mean
Oracle-based OLTP/ERP systems generally have an amount of batch processing and reporting that takes place in support of the application. That’s true. Batch processing and reporting must surely require massive I/O bandwidth and, indeed, massive I/O would naturally benefit from Exadata offload processing. That is how the sales pitch goes.

I won’t argue for a moment that Exadata offers significant I/O bandwidth. There is 3.2 GB/s of realizable storage bandwidth (Infiniband) for data flow to/from each server in an Exadata configuration. That’s roughly equivalent to 2 active 16GFC HBA ports. It’s a lot. However, since I’ve just spelled out the conventional storage connectivity required to match the 3.2 GB/s, the question boils down to whether or not the Exadata storage offload processing feature (Smart Scan) adds value to the type of reporting and batch activity common in Oracle OLTP/ERP environments.  I can’t prove a negative in this regard but I can say this:

Batch / reporting queries are not candidates for improvement by Smart Scan technology unless the plans are access method full (table or index fast full scan)

Most batch processing I’ve seen is quite compute-intensive as well as index-based (and not full scan). But, as I said, I cannot prove a negative. So, (pun intended) I’ll stop being negative. I’d like to defer to the positive proof Oracle offers on this topic. Oracle’s own designed benchmarks for proving platform suitability for Oracle E-Business Suite. The benchmark I have in mind is the Oracle E-Business Suite 12.1.3 Standard Extra-Large Payroll (Batch) Benchmark.

Oracle’s own description of the workload speaks volumes:

“Extra-Large”, “Batch”, Extremely Useful In Sizing And Capacity Planning
On April 10 2012, Oracle put out a press release highlighting the Sun Fire X4270 M3 payroll batch benchmark result. The press release made the following point about the result and the workload (emphasis added by me):

The Oracle E-Business Suite R12.1.3: Oracle’s Sun Fire X4270 M3 server posted the fastest results on the Payroll batch component of the Oracle E-Business Suite R12 X-large benchmark, completing the workload in less than 20 minutes. This result demonstrates that Oracle’s x86-based servers, running Oracle Linux, can deliver excellent throughput and are well suited for customers running batch applications in conjunction with Oracle Database 11g R2 (5).

I need to quickly point out two things. First, Oracle has an entire suite of their own benchmarks yet never has there been a published result for Exadata.  I know, that is old news and seemingly uninteresting to Oracle customers considering Exadata. Second, I highlighted “fastest results” because it just turns out that this result is in fact the first posted result with this version of the benchmark :

I know, I’m being petty, right? Why would anyone insist on OLTP/ERP benchmarks when considering a platform (Exadata) optimized for DW/BI workloads? I know, I’m sorry, petty again.

So What Is The Point?
If a single Sun Fire X4270 M3 can achieve excellent results with an Oracle-defined E-Business Suite batch benchmark attached to non-Exadata storage, don’t we have proof that Exadata isn’t required to allay the fears of batch processing on x86 without Exadata?  If Exadata added significant value (the sort that helps one absorb the sticker shock) to batch processing, wouldn’t there be published results?

Has Oracle simply not had enough time to publish Exadata benchmark results? Not even enough time given the fact that the benchmarks I speak of are their own benchmark specifications? I can answer those questions—but I won’t. Instead, I’ll focus on some of the particulars of the 4270 M3 result that would actually make it a very difficult workload for even a half-rack Exadata Database Machine X2-2!

The following table comes from Oracle’s full report on the batch benchmark result :

This table shows us a difficult profile–a batch processing profile. A batch profile that warrants the term “Extra-Large” in the name of the benchmark. Please notice that the peak write IOPS is 14,772. A half-rack Exadata Database Machine (X2) has the (datasheet) capacity for 12,500 random mirrored writes per second (WIOPS) thus 14,772 is more WIOPS than a half-rack Exadata can sustain. But what about all those extra processors an Exadata half-rack would offer over this server? Indeed, this was a 2s16c32t Xeon E5-2600 (Sandy Bridge) single server result. A half-rack Exadata (X2-2) has 48 Xeon 5600 cores. Surely the Sandy Bridge 2S server was totally out of gas, right?  No. The full report for the benchmark includes processor utilization:

Summary
There is no such thing as “pure OLTP.” Oracle has proven that fact with the Payroll Batch benchmark. Oracle has further proven that a single 2s16c32t E5-2600 server is capable of achieving a world record result on their own benchmark (“Extra Large”) and that particular achievement was possible without Exadata. In fact, it was possible without even saturating the single server E5-2600 CPUs–but, hey, at least the WIOPS demand was higher than a half-rack Exadata Database Machine X2 can sustain!

You need Exadata to handle the batch requirements for modern E-Business Suite?

You spend a lot on Oracle Database, Applications and support. Spend wisely on the platform.

Next Page »


EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 537 other followers

Oracle ACE Program Status

Click It

website metrics

Follow

Get every new post delivered to your Inbox.

Join 537 other followers