Oracle Exadata Storage Server: 485x Faster Than…Oracle Exadata Storage Server. Part I.

Published December 10, 2008 oracle 29 Comments

I recently read an article by Curt Monash entitled Interpreting the results of data warehouse proofs-of-concept (POCs). Curt’s post touched on a topic that continually mystifies me. I’m not sure when the phenomenon started, but I’ve witnessed a growing trend towards complete lack of scrutiny when it comes to the performance claims made by most vendors in the data warehousing space. For example, Netezza makes a blanket claim that their appliance is 100-fold faster than Oracle. Full stop. Er, not full stop… Netezza doesn’t stop there. They claim:

While Netezza makes claims of 100x performance gains, it is not uncommon to see performance differences as large as 200x to even 400x or more when compared to existing Oracle systems.

100x Speed-up: Child’s Play

But, honestly, 100x is child’s play. Forget for a moment that there is no itemization of where that speedup would come from in Netezza’s high-level messaging. Such information would be technical marketing and I wouldn’t expect Netezza to disclose any sort of justification for where 100x speedup comes from. Lowered expectations. Shucks, these DW arms-race marketing claims remind me of that famous Saturday Night Live skit that seems to have served as the play book for these marketing guys-in more ways than one!

Ok, chuckles aside, Curt’s post on the topic included a link to a spreadsheet of recent Proof of Concept results where “the incumbent” was trounced to the tune of 335x in reporting tasks. Like I said, 100x is child’s play.

Intellectual Curiosity

Nobody should look at a claim such as 335x without wondering where in the world such a speedup comes from and shame on any vendor that isn’t willing to itemize the benefit. After all, without some knowledge of what produces such astounding speedup, how is the dutiful DW practitioner to expect the speedup to remain intact over time or, moreover, how to replicate the “magic” elsewhere. I’m more than willing to itemize to anyone any claim of Oracle Exadata Storage Server speed up on any query. Exadata is not “magic” so accounting for its benefit is very easy to do. But, back to the 335x for a moment. This is actually quite simple. To get 335x speedup one of the following is true:

The query was executed on a platform with 335x storage bandwidth
The query was executed on a platform with 335x processor bandwidth
The query manipulated 1/335^th the amount of data
Some combination of items 1 through 3

Number 3 in the list is achieved through things like partition elimination, indexing, materialized views, more efficient joins, and so forth. This is what Oracle refers to as the “Brainy Approach” to improved data warehouse query performance. Of course Oracle has, and retains all these “Brainy” optimization approaches, and more, when Exadata is in play. Exadata is a solution offering both “Brainy” and, most importantly, “Brawny” technology.

Let’s think this 335x thing through for a moment. Imagine that the 335x was a Netezza 10100 and the 335x was an improvement over a traditional Oracle incumbent (no Exadata). One of Netezza’s main value propositions is that they are able to utilize full bandwidth of all the disks in the system in parallel-just like Exadata. That’s the “Brawny” approach. As I point out in my post about “arcane” disk technology, this value proposition is the least we deserve, but because of typical storage provisioning most Oracle deployments don’t benefit from the aggregate bandwidth their drives could actually offer. So kudos to Netezza for that. What if this was Netezza and the 335x was due to the NPS 10100 “Brawny” disk bandwidth capability? Well, that chalks the win to item 1 in the list and therefore the incumbent system was configured with 1/335^th the amount of disk bandwidth of the NPS system. If I grant the NPS system 70 MB/s per disk drive I get roughly 7.5 GB/s (108 * 70MB). Does that mean the incumbent was ingesting only 22 MB/s (7.5 GB/335)? Would anyone care about that result? Would you be proud if you got more performance from 108 SATA drives than a single USB 2.0 drive? I shouldn’t think the 335x came solely from list item 1.

The NPS 10100 has 108 processors pounding on the data as it comes off the drives. Can we get 335x over our imaginary incumbent from sheer processing power? Sure, so long as the incumbent was running Oracle on a processor with 1/3^rd the bandwidth of a single PowerPC processor (the embedded CPU on a Netezza SPU). Would anyone be excited to beat 1/3^rd a CPU with 108 CPUs?

No, folks, the 335x was certainly the product of item 4 on the list-with a very heavy slant towards item 3-regardless of which appliance vendor it actually was.

A 335 Fold Improvement is Child’s Play? I want 485 Fold!

Humor me as I walk through a little exercise to elaborate more on this topic. In the following session I’ll demonstrate a query accessing precisely the same amount of data using the same SQL, in the same Oracle session, attached to the same Oracle database. You’ll see that I execute a host command to prove that within the scope of 15 seconds I am able to demonstrate a 485x speedup. You can choose to believe me or not, but the facts are as follows:

The amount of data in the table is the same in each case.
The data in every column of every row is the same.
The order of rows in the table is the same.
There is no compression involved at any point.
The table datatypes are the same.
The query plan is the same.
The Oracle Parallel Query Degree of Parallelism remains constant. That means equal CPUs attacking the data.
There are no indexes, materialized views, partitioning or any sort of data elimination.
The Oracle Results Cache feature was not used.
The data in each case resided on the same disks.

And, oh, before I forget to say so, this is Exadata. So, can Oracle market Exadata as 485x faster than Exadata without the use of any data elimination techniques? See for yourself and fill out a comment with your explanation for what I have shown here.

First, a listing of the “demo” script:

SQL> !cat demo.sql

set echo off
set timing off
col sum_sales format 999,999,999,999,999,999
host date

desc card_trans

set echo on
select count(*) from card_trans;
set timing on

select sum(purchase_amt) sum_sales from card_trans;
host date

In the following screen capture I’ll show that the query took 970 seconds to complete. I used the SUM aggregate against the 100+ million purchase_amt column values as a means to show I’m querying the same content in both cases.

SQL> @demo
Wed Dec 10 08:59:48 PST 2008

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
CARD_NO                                   NOT NULL VARCHAR2(20)
CARD_TYPE                                          CHAR(20)
MCC                                       NOT NULL NUMBER(6)
PURCHASE_AMT                              NOT NULL NUMBER(6)
PURCHASE_DT                               NOT NULL DATE
MERCHANT_CODE                             NOT NULL NUMBER(7)
MERCHANT_CITY                             NOT NULL VARCHAR2(40)
MERCHANT_STATE                            NOT NULL CHAR(3)
MERCHANT_ZIP                              NOT NULL NUMBER(6)

SQL> select count(*) from card_trans;

COUNT(*)
----------
107389152

SQL>
SQL> set timing on
SQL>
SQL> select sum(purchase_amt) sum_sales from card_trans;

SUM_SALES
------------------------
6,443,502,770

Elapsed: 00:16:10.15
SQL>
SQL> host date
Wed Dec 10 09:32:08 PST 2008

The first pass of the script ended in the same session at 9:32:08 and 11 seconds later I executed the script again. The session capture shows that there was a 485x speed up (970 seconds down to 2 seconds). Like I said, “100x is childs play.” Well, at least it is when there is no accounting offered for the improvement. Pshaw, it seems I learned a lot from that “training” video I reference above.

SQL>  @demo
SQL>
SQL> set echo off
Wed Dec 10 09:32:19 PST 2008

Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
CARD_NO                                   NOT NULL VARCHAR2(20)
CARD_TYPE                                          CHAR(20)
MCC                                       NOT NULL NUMBER(6)
PURCHASE_AMT                              NOT NULL NUMBER(6)
PURCHASE_DT                               NOT NULL DATE
MERCHANT_CODE                             NOT NULL NUMBER(7)
MERCHANT_CITY                             NOT NULL VARCHAR2(40)
MERCHANT_STATE                            NOT NULL CHAR(3)
MERCHANT_ZIP                              NOT NULL NUMBER(6)

SQL> select count(*) from card_trans;

COUNT(*)
----------
107389152

SQL>
SQL> set timing on
SQL>
SQL> select sum(purchase_amt) sum_sales from card_trans;

SUM_SALES
------------------------
6,443,502,770

Elapsed: 00:00:01.96
SQL>
SQL> host date
Wed Dec 10 09:32:23 PST 2008

SQL> select count(*) from user_indexes ;

COUNT(*)
----------
0

Elapsed: 00:00:00.11

Part II in this series: click here.

29 Responses to “Oracle Exadata Storage Server: 485x Faster Than…Oracle Exadata Storage Server. Part I.”

Feed for this Entry Trackback Address

1 Justis December 11, 2008 at 1:55 am

RESULT_CACHE_MODE = FORCE ?

Reply
- 2 kevinclosson December 11, 2008 at 2:00 am
  
  Justis,
  
  No. The Results cache feature was not used. Thanks for taking a guess!
  
  Reply
3 Karl Reitschuster December 11, 2008 at 8:48 am

Hi – just some guesswork;-)

in you first example I see a complete runtime of about 32 minutes
for both statements (2x16minutes). Both SQL scanned the complete table and there was no caching benefit running the second sum… sql;

In the second Example both sql run a lot faster;-)
Not due to caching of data blocks.
Means to me Oracle must have known the Statements already run
and because of read consistency the result would be the same because no data was not modified in your current session.

would only work if you would not reconnect i think
just my 4 cents

Karl Reitschuster

Reply
4 Jeff Moss December 11, 2008 at 8:57 am

for 485x speed up, I can only assume that the second iteration didn’t actually do the same “work”…it didn’t actually go and scan the 100m+ rows on the second run and that’s why it’s orders of magnitude faster.

I don’t know Exadata (yet!) but somewhere you’re gaining from a cache effect I guess…You don’t mention the DOP of the PQ – and specifically you don’t therefore exclude it being SERIAL …in which case, if it was run serially, you’d be using buffer cache rather than the PQ slaves always reading from disk…in which case, the second run may be all in the buffer cache (if it’s big enough for your dataset), whilst the first one is all reads in from disk.

Alternatively, if it was PARALLEL, then perhaps there is there some kind of cache effect from Exadata itself such that it knows it doesn’t need to go to the disks to give the answer to the query.

How big is the segment?

Can we see the plans…or would that give the game away?

I’m intrigued now…more clues please sir!

Reply
5 Eric December 11, 2008 at 9:36 am

Buffer cache of the storage servers?

Reply
6 Jan December 11, 2008 at 12:47 pm

buffer cache ?

Reply
7 Jim December 11, 2008 at 2:13 pm

One difference is that in the first run Oracle had to do a hard parse of the query. In the 2nd case it didn’t. I wouldn’t expect that to contribute much – maybe a second – , but it would add some time. My expectation would be that the parsing difference (plan generation etc.) would at worst amount to 1 second of difference.

Reply
8 kevinclosson December 11, 2008 at 3:41 pm

Eric, Jan: There is no cache effect involved. Thanks for the guess.

Jim: A hard parse takes a fraction of a second. Our spread is 2->970.

Jeff: The session did trigger the PQO scanning of the same number of rows. In that regard it is the same work. The plans are the same.

Kerl: What you explain would be results cache and no result cache was involved. There is a physical scanning of the same number and size of rows in each case.

Reply
9 Timur Akhmadeev December 11, 2008 at 3:43 pm

Kevin,
it’s number 2 in your list, because other options are not fit to your own constraints 😉 Maybe the same trick as in this (https://kevinclosson.wordpress.com/2008/10/07/oracle-exadata-storage-server-a-black-box-with-no-statistics/) post.

Reply
10 kevinclosson December 11, 2008 at 5:05 pm

Timur,

It is number 4 in the list, but I’m waiting to see if anyone can finger which if items 1-3 were the biggest benefactor. Oh, wait, 4 potentially includes 2. There is no item 2 involved as the same CPUs were used in all cases. Also, I chose my words*very* carefully when I described the test scenario so there is room for list items 1 and 3 without turning me into a fibber…let’s see if anyone else can drill in on what I’ve done…since the post has only been viewed 388 times so far there is still hope 🙂

Reply
11 Timur Akhmadeev December 11, 2008 at 5:59 pm

OK, I just have no idea how to put 3rd item in game (“There are no … any sort of data elimination” – so how to manipulate 1/X amount of data with the same data and execution plan?..) For the 2nd – it could be offlined CPUs. Running first example with DOP=32 on a single core could be much slower than 32 * (time on 32 online CPUs) because of CPU starvation.

Reply
- 12 kevinclosson December 11, 2008 at 6:10 pm
  
  Timur,
  
  You’re being very patient and a good sport! Thanks. I’m choosing my words carefully. There was no data elimination. There were no CPUs offlined. In each case the same CPUs were in play.
  
  Reply
13 Timur Akhmadeev December 11, 2008 at 6:16 pm

Oh, wait! Is there any way to turn off Exadata’ feature of returning just wanted columns? If this can be done, then 3rd option can also be explained: “something” in configuration has been changed in such a way that all data from the storage returned, resulted in massive increase in manipulated data. But I’m not sure if the execution plan in this case will remain the same…

Reply
14 kevinclosson December 11, 2008 at 6:35 pm

Timur,

It is indeed possible to create a scenario where a Smart Scan will not be used…such a phenomenon does not change the query plan. Let’s say I did hobble Smart Scan just for the sake of discussion. I didn’t specify how much Exadata I used in this lesson, but it was in fact 14 Exadata Storage Servers or 14 GB/s scan throughput. Further, I didn’t specify how many RAC nodes I was using in the database grid. I was using 1. So can we get 485x by comparing the offloaded scan rate of 14 GB/s to the max single node RAC ingestion rate? Surely not. That would mean a DL360 with 20Gb IB can only ingest 14GBPS/485 or 29MB/s? Not so.

But, I said I used a combo of 1-3 in the list and I did in fact hobble Smart Scan. So Timur, you are partly there, just remember that 14GBPS/485 is only 29MB/s which is way under the threshold of my stipulation that I used all CPUs. That is, in each case there was 100% of a single DL360’s bandwidth in the database grid used in each case.

Are we having fun yet?

Reply
15 Toon Koppelaars December 11, 2008 at 6:38 pm

Dynamic sampling in the first case.
And not done in second case?

Reply
16 Toon Koppelaars December 11, 2008 at 6:39 pm

Delayed block cleanouts in first case?

Reply
17 kevinclosson December 11, 2008 at 6:43 pm

Ahhh, Toon, the first of my fellow Oaktable Network members to show up! Thanks for stopping by. Dynamic sampling is in no way involved. blocks remained clean during the test in both cases.

Reply
18 kurtvm December 11, 2008 at 7:18 pm

Block density perhaps ?

You didn’t mention that the number of records per block
was a constant. So it would be possible that in the first scenario you created a table with a low amount of records per block, resulting in a large segment, needing a lot of io’s. (you could have used 1 row/block for example)

While in the second scenario you could have used a high number of blocks per record, resulting in a smaller segment, and thus needing a lower amount of io’s to fulfill the query.

Reply
19 Riyaj Shamsudeen December 11, 2008 at 7:26 pm

Hi Kevin

I think that

1. you changed either parallel_max_servers or degree of parallelism for underlying table to a much higher value to use all available CPUs, more efficiently.

2. you changed exadata specific init.ora parameters so that all disks in exadata array are used more efficiently (concurrently).

Just a guess 🙂

Cheers
Riyaj

Reply
- 20 kevinclosson December 11, 2008 at 8:12 pm
  
  Hi Riyaj,
  
  I stipulated PQO parity so, nope, that’s not it … there are no init.ora parameters that boost Exadata 🙂
  
  Reply
21 Eric Grancher December 11, 2008 at 7:55 pm

Kevin,

thank you for your blog, not only informative (and a reference) but also a game now, cool!

Let’s try a guess…

– first pass has one Exadata offload feature disabled

– second part has the offload feature enabled

The offload feature could be the “Smart Scan Column Filtering” (in which case, the benefit of having the offload feature enabled could become very important in the case of large row length, projecting the only relevant column can provide a gain as large as wanted, even 486… add more columns / make them larger)?

this was a guess… I have played my token.

regards,
eric

Reply
22 Riyaj Shamsudeen December 11, 2008 at 8:53 pm

Hi Kevin
May be, you increased UNIX file system buffer cache parameters so much to avoid any disk I/O ?

Cheers
Riyaj

Reply
23 Riyaj Shamsudeen December 11, 2008 at 9:03 pm

Okay, come to think of it, it can’t be UNIX file system buffer cache since if that is the case then first statement counting would have been slower, not the second statement..

Ignore my earlier comment 😦

Reply
24 Timur Akhmadeev December 11, 2008 at 9:10 pm

And what about kurtvm comment (https://kevinclosson.wordpress.com/2008/12/10/oracle-exadata-storage-server-485x-faster-thanoracle-exadata-storage-server-part-i/#comment-33456) on density? Fits perfect, imho.

Reply
25 kevinclosson December 11, 2008 at 9:30 pm

Timur, Kurt, All…

Yes, Kurt got it. See Part II in the series.

Reply
26 Michael Norbert January 15, 2009 at 2:47 pm

I really enjoyed this post and the followup. Great blog. Mostly because I could actually follow it without hurting my brain.

Reply
- 27 kevinclosson January 15, 2009 at 4:49 pm
  
  Thank you for the kind words, Michael!
  
  Reply

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage