Oracle Database 11g Versus Orion. Orion Gets More Throughput! Death To Oracle Database 11g!

Several readers sent in email questions after reading the Winter Corporation Paper about Exadata I announced in a recent blog entry. I thought I’d answer one in this quick blog entry.

The reader’s email read as follows (I made no edits other than to remove the SAN vendor’s name):

I have read through the Winter Corporation paper regarding Exadata, but is it ok for me to ask a questio? We have an existing data warehouse on a 4 node 10g RAC cluster attached to a [ brand name removed ]   SAN array by 2 active 4GB ports on each RAC node. When we test with Orion we see nearly 2.9 gigabytes per second throughput but with RDBMS queires we never see more than about 2 gigabytes per sec throughput except in select count(*) situation. With select count(*) we see about 2.5GB/s. Why is this?

Think Plumbing First

It is always best to focus first on the plumbing, and then on the array controller itself. After all, if the supposed maximum theoretical throughput of an array is on the order of 3GB/s, but servers are connected to the array with limited connectivity, the bandwidth is unrealizable. In this case, 2 active 4GFC HBAs per RAC node demonstrate sufficient throughput for this particular SAN array. Remember, I deleted the SAN brand. The particular brand cited by the reader is most certainly limited to 3GB/s (I know the brand and model well)  but no matter because the 2 active 4GFC paths to each RAC node limit I/O to an aggregate of 3.2GB/s sustained read throughput no matter what kind of SAN it is. This is actually a case of a well-balanced server-to-storage configuration and I pointed out so in a private email to the blog reader who sent me this question. But, what about the reader’s question?

Orion is a Very Light Eater

The reason the reader is able to drive storage at approximately 2.9GB/s with Orion is because Orion does nothing with the data being read from disk. As I/Os are completed it simply issues more. We sometimes call this lightweight I/O testing because the code doesn’t touch the data being read from disk. Indeed, even the dd(1) command can drive storage at maximum theoretical I/O rates with a command like dd if=/dev/sdN of=/dev/null bs=1024k. A dd(1) command like this does not touch the data being read from disk.

SELECT COUNT(*)

The reason the reader sees Oracle driving storage at 2.5GB/s with a SELECT COUNT(*) is because when a query such as this reads blocks from disk only a few bytes of the disk blocks being read are loaded into the processor caches. Indeed, Oracle doesn’t have to touch every row piece in a block to know how many rows the block contains. There is summary information in the header of the block that speeds up row counting. When code references just one byte of data in an Oracle block, after it is read from disk, the processor causes the memory controller to load 64 bytes (on x86_64 cpus) into the processor cache. Anything in that 64-byte “line” can be accessed for “free” (meaning additional loads from memory are not needed). Accesses to any other 64-byte lines in the Oracle block causes subsequent memory lines to be installed into the processor cache. While the CPU is waiting for a line to be loaded it is in a stalled state, which is accounted for as user-mode cycles charged to the Oracle process referencing the memory. The more processes do with the blocks being read from disk, the higher processor utilization goes up and eventually I/O goes down. This is why the reader stated that they see about 2GB/s when Oracle is presumably doing “real queries” such as those which perform filtration, projection, joins, sorting, aggregation and so forth. The reader didn’t state processor utilization for the queries seemingly limited to 2GB/s, but it stands to reason they were more complex than the SELECT COUNT(*) test.

Double Negatives for Fun and Learning Purposes

You too can see what I’m talking about by running a select that ingests dispersed columns and all rows in a query after performing zero-effect filtration such as the following 16-column table example:

SELECT AVG(LENGTH(col1)), AVG(LENGTH(col8)), AVG(LENGTH(col16)) FROM TABX WHERE col1 NOT LIKE ‘%NEVER’ and col8 NOT LIKE ‘%NEVER’ and col16 NOT LIKE ‘%NEVER’;

This test presumes columns 1,8 and 16 never contain the value ‘NEVER’. Observe the processor utilization when running this sort of query and compare that to a simple select count(*) of the same table.

Other Factors?

Sure, the reader’s throughput difference between the SELECT COUNT(*) and Orion could be related to tuning issues (i.e., Parallel Query Option degree of parallelism). However, in my experience achieving about 83% of maximum theoretical I/O with SELECT COUNT(*) is pretty good. Further, the reader’s complex query achieved about 66% of maximum theoretical I/O throughput which is also quite good–when using conventional storage.

What Does This Have To Do With Exadata

Exadata offloads predicate filtering and column projection (amongst the many other value propositions). Even this silly example has processing that can be offloaded to Exadata such as filtration that filters out no rows and the cost of projecting columns 1,8 and 16. The database host spends no cycles with the filtration or projection. It just performs the work of the AVG() and LENGTH() functions.

I didn’t have the heart to point out to the reader that 3GB/s is the least amount of throughput available when using Exadata and Real Application Clusters (RAC). That is, with RAC the fewest number of Exadata Storage Servers supported is 3 and there’s no doubt that 3 Exadata Storage Servers do indeed offer 3GB/s query I/O throughput. In fact, as the Winter Corporation paper shows, Exadata is able to perform maximum theoretical I/O throughput even with complex, concurrent queries because there is 2/3rds of a Xeon 54XX “Harpertown” processor for each disk drive offloading processing from the database grid.

So, while Orion is indeed a “light eater”, Exadata is quite ravenous.

1 Response to “Oracle Database 11g Versus Orion. Orion Gets More Throughput! Death To Oracle Database 11g!”



  1. 1 A grand tour of Oracle Exadata, Part 2 | The Pythian Blog Trackback on July 27, 2010 at 2:48 pm

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,948 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 1,948 other followers

%d bloggers like this: