Exadata Database Machine: The Data Sheets Are Inaccurate! Part – I. | Kevin Closson's Blog: Platforms, Databases and Storage

Yes, the title of this blog entry is a come-on. I am ever-so-slightly apologetic (smiley face).

This post follows the longest dry spell in my blogging over the last five years. I haven’t posted since early January and thus I am quite overdue for the next installment in my series regarding the Oracle Database 11g Direct NFS clonedb feature. I set out to make the next installment yesterday but before doing so I visited the analytics for my blog readership to see what’s been happening. I discovered that essentially nobody comes to this blog through Exadata related search terms anymore. That surprised me. Indeed, for the first—what—two or so years after Exadata went into general availability the first page worth of Google search results always included some of my posts. I can’t find any of my Exadata posts in the first several pages Google spoon-feeds me now when I google “Exadata.” This isn’t a wounded-soul post. I do have a point to make. Humor me for a moment while I show the top twenty search terms that have directed readers to my blog since January 1, 2011.

kevin closson	417
oracle performance	320
oracle 11g	290
oracle linux	200
oracle on flash SSD	188
oracle nfs clonedb	182
intel numa	133
oracle on nfs	122
oracle fibre channel	115
huge pages allocated	104
oracle orion	99
real application clusters	92
automatic memory management	82
oracle xeon	80
oracle i/o	78
oracle file systems	75
oracle numa	73
_enable_NUMA_support	73
greenplum versus exadata	70
oracle exadata	69

So, as far as search terms go there seems to be a lack of traffic coming to this site for Exadata-related information. The page views for my Exadata posts are high, but the search terms are lightly-weighted. This means folks generally read Exadata-related material here after being directed for a non-related search term. Oh well. I’d ordinarily say, “so what.” However, it is unbelievable to me how many people ask me questions each and every day that would be unnecessary if not for a quick read of one of the entries I posted before Oracle Open World 2010. That post, entitled Seven Fundamentals Everyone Should Know Before Attending Openworld 2010 Sessions might be better named You Really Need to Know This Little Bit About Exadata Before Anyone Else Tries to Tell You Anything About Exadata. Folks, if you get a moment and you care at all about Exadata, please do read that short blog entry. It will enhance your experience with your sales folks or any other such Exadata advocates. Indeed, who wants to be introduced to a technology solution by the folks trying to sell it to you. Now, don’t get me wrong. I’m not saying Exadata sales folks are prone to offering misinformation. What I’m trying to say is your interaction with sales folks will be enhanced if you don’t find yourself in such remedial space as the very definition of the product and its most basic fundamentals. That leads me to point out some of the folks who have taken the helm from me where Exadata blog content is concerned.

Oaktable Network Members Booting Up Exadata Blogging
Fellow Oaktable Network member Kerry Osborne blogs about Exadata, in addition to his current efforts to write a book on the topic. I’ve seen the content of his book in my role as Technical Editor. I think you will all find it a must-read regarding Exadata because it is shaping up to be a very, very good book. I have the utmost of respect for fellow Oaktable Network members like Kerry. In addition to Kerry, Fritz Hoogland (a recent addition to the Oaktable Network) is also producing helpful Exadata-related content. Oracle’s Uwe Hesse blogs frequently about Exadata-related matters as well. So, there, I’ve pointed out the places people graze for Exadata content these days. But I can’t stop there.

We Believe the Oracle Data Sheets
The content I’ve seen in blogs seems to mostly confirm the performance claims stated in Oracle Data Sheet materials. Let me put it another way. We all know the latest Exadata table/index scan rates (e.g., 25 GB/s HDD full rack or 70GB/s combined Flash + HDD). We’ve seen the Data Sheets and we believe the cited throughput numbers. I have an idea—but first let me put on my sarcasm hat. I’m going to predict that the next person to blog about Exadata will start out by blogging something very close to the following:

My big_favorite_table has many columns and a bazillion rows. On disk it requires 200 gigabytes of storage but with mirroring it takes up 400 gigabytes. When I run the following query—even without Exadata Smart Flash Cache—it only takes eight seconds on my full-rack Exadata configuration to get the result:

 
SQL> select count(*) from big_favorite_table where pk_number < 0;
COUNT(*)
----------
0

Don’t get me wrong. It is important for folks to validate the Data Sheet numbers with their own personal testing. But folks, please, we believe the light-scan rates are what the marketing literature states. I’m probably not alone in my desire to see blogs on users’ experience in solving particularly complex analytical data analysis problems involving vast amounts of data stored in Exadata. That sort of blogging is where social networking truly ads value—you know, going “beyond the Data Sheet.”

In Closing
So what does all this have to do with the infrequent nature of my blogging? Well, I’ll just have to leave that for a future entry. And, no, the Data Sheets on Exadata Database Machine are not inaccurate.

12 Responses to “Exadata Database Machine: The Data Sheets Are Inaccurate! Part – I.”

Feed for this Entry Trackback Address

1 Jonathan Lewis March 8, 2011 at 8:46 am

My quest in (Oracle-related) life – how to get people to recognise the difference between a valid test and a misleading exercise.

Let me guess about this hypothetical future blog: 8 seconds to NOT READ every megabyte except the megabyte that the storage index says might contain some data ? (That’s assuming that the PK index hasn’t been created – otherwise 8 seconds to read 3 or 4 index blocks is just a teeny-weeny bit slow.)

2 oradba March 9, 2011 at 9:46 pm

You mention Kerry Osborne and Fritz Hoogland but forgot Tanel Poder?

http://blog.tanelpoder.com/2010/10/25/count-stopkey-operation-the-where-rownum/

- 3 kevinclosson March 9, 2011 at 11:16 pm
  
  oradba,
  
  Uh, trust me. I may have forgotten to mention Tanel, but I haven’t fogotten him! (smiley)
  
4 Doug Burns March 11, 2011 at 6:46 pm

Perhaps one of the reasons there haven’t been more Exadata posts is because some consultants are working for customers who would rather not share related information publicly?

Just a theory.

- 5 kevinclosson March 11, 2011 at 7:48 pm
  
  Good theory. I’m not looking for customer proof-points in particular. I just wish people would stop proving remedial Data Sheet-type performance.
  
  Travel safe, Doug. You must be on your way back from Hotsos Symposium now, or soon?
  
  - 6 Doug Burns March 11, 2011 at 7:54 pm
    
    Just made it back this morning. Trip was ok, but lots of post-conference catch-up going on.
    
7 Mark W. Farnham March 11, 2011 at 9:03 pm

Have them give me access to an exadata and I see whether I can still create tests relevant to the server purchaser!

Perhaps I can collect relevant tests from fellow Oakies who also have no exadata box at their fingertips.

I suppose we can even skip the data sheet proofs, although those seem like a good idea to test whether “all the wires” are plugged in correctly on a new box.

mwf

- 8 kevinclosson March 11, 2011 at 10:07 pm
  
  Mark,
  
  Who is “them?”
  
9 fsengonul March 28, 2011 at 3:16 pm

You may check my blog and presentations in order to see exadata in action.
[…text deleted by Kevin..]

- 10 kevinclosson March 28, 2011 at 3:25 pm
  
  Hello Ferhat,
  
  In my blog post I was asking for folks to blog about specific performance attributes of Exadata–specifically highly-complex queries–as opposed to the standard SELECT COUNT(*) FROM EMP WHERE SAL < 0; I've quickly scanned your blog and don't see any such performance references. Can you provide links to such content specifically on your blog? I had to moderate out the blanket reference to your blog front page as it is not germane to this specific blog entry. I will gladly let through any blog references that meet the specific criteria I've spelled out.
  
  Thanks for stopping by.
  
  - 11 fsengonul March 28, 2011 at 4:18 pm
    
    Hi Kevin,
    You’re right about the specific query issue.
    But, you may find the avg performance increase numbers in the open world presentation : http://ferhatsengonul.wordpress.com/2010/10/18/exadata-presentation/
    
    and the effects of sorting in compression in http://ferhatsengonul.wordpress.com/2010/08/09/getting-the-most-from-hybrid-columnar-compression/
    
    Because of the security rules in my company I cannot share any query which gives clue about the structure. But I have a 2 RACK x2-2 system which has not became production yet.
    If you can provide me some queries on the swingbench data of Dominic Giles or from any other source , I can easily test them.
    
    Thanks.
    
    Thanks.
    
    - 12 kevinclosson March 28, 2011 at 4:53 pm
      
      Ferhat,
      
      You do realize, don’t you, that for the last four years I worked as a performance architect in Oracle’s Exadata development organization? I personally will not benefit from OOW slide sets with “avg performance increase” figures and I don’t think readers of this blog will either. The specific point of this blog entry is to encourage people who are fresh to Exadata-related blogging to stop blogging simple regurgitations of the data sheets. The blogging community would do well to start blogging about complex queries and analytics and so forth.
      
      I’ve let your URLs through so readers can click through to your blog if they’d like to.
      
      Thanks for stopping by.

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage