Will Oracle Exadata Database Machine Eventually Support Offload Processing for Everything?

BLOG UPDATE 24 SEP 2011: This blog entry has been viewed slightly more than 50 times per day, on average, since it was originally posted several months ago.  At this point I’d like to update the post with these words to serve as a bit of a preface to the post itself. The final point made in this post offers a glimpse into one of the technical reasons I resigned my position as Performance Architect in Oracle’s Exadata development organization. 

In my recent post entitled Exadata Database Machine: The Data Sheets Are Inaccurate! Part – I, I drew attention to the fact that there is increasing Exadata-related blog content produced by folks that know what they are talking about. I think that is a good thing since it would be a disaster if I were the only one providing Exadata-related blog content.

The other day I saw Tanel Poder blogging about objects that are suitable targets for Smart Scan. Tanel has added bitmap indexes to his list. Allow me to quickly interject that the list of what can and cannot be scanned with Smart Scan is not proprietary information. There are DBA views in every running Oracle Database 11g Release 2 instance that can be queried to obtain this information.  Tanel’s blog entry is no taboo.

So, while Tanel is correct, I think it is also good to simply point out that the seven core Exadata fundamentals do in fact cover this topic. I’ll quote the relevant fundamentals:

Full Scan or Index Fast Full Scan.

  • The required access method chosen by the query optimizer in order to trigger a Smart Scan.

Direct Path Reads.

  • The required buffering model for a Smart Scan. The flow of data from a Smart Scan cannot be buffered in the SGA buffer pool. Direct path reads can be performed for both serial and parallel queries. Direct path reads are buffered in process PGA (heap).

So, another way Tanel could have gone about it would have been to ask, rhetorically, why wouldn’t Exadata perform a Smart Scan on a bitmap index if the plan chooses access method full? The answer would be simple—no reason. It is an index after all and can be scanned with fast full scan.  So why am I blogging about this?

Can I Add Index Organized Tables To That List?
In a recent email exchange, Tanel asked me why Smart Scan cannot attack an index organized table (IOT). Before I go into the outcome of that email exchange I’d like to revert to a fundamental aspect of Exadata that eludes a lot of folks. It’s about the manner in which data is stored in the Exadata Storage Servers and how that relates to offload processing such as Smart Scan.

Data stored in cells is striped by Automatic Storage Management (ASM) across the cells with coarse-grain striping (granularity established by the ASM allocation unit size). With Exadata, the allocation unit size by default—and best-practice—is 4MB. Therefore, tables and indexes are scattered in 4MB chunks across all the cells’ disks.

Smart Scan performs multiple, asynchronous 1MB reads for allocation units (thus four 1MB asynchronous reads for adjacent 1MB storage regions). As the I/O operations complete, Smart Scan performs predicate operations (filtration) upon each storage region (1MB). If the data contained in a 1MB region references another portion of the database (e.g., a chained row ), Smart Scan cannot completely process that storage region. The blocks that reference indirect data are sent to the database grid in standard block form (the same form as when reading an ASM disk on conventional storage). The database server then chases the indirection because only it has the code to map the block-level indirection to an ASM AU in some cell, somewhere. Cells cannot ask other cells for data because cells don’t know anything about each other. The storage grid of Exadata is shared-nothing.

Thus far, in this blog post, I’ve taken the recurring question of whether Smart Scan works on a certain type of object (in this case IOT) and broadened the discussion to focus on a fundamental aspect of Exadata. So what does this broadened scope have to do with Smart Scan on IOT? Well, when I read that email from Tanel I used logic based on the fundamentals and shot off an answer. Before that hasty reply to Tanel I recalled IOT has the concept of an overflow tablespace. The concept of overflow tablespace—in my mind—has “indirection” written all over it. Later I became more curious about IOT so I scanned through the oracle source code (server side) and couldn’t find any hard barriers against Smart Scan on IOT. I was stumped (trust me that aspect of the code is not all that straightforward) so I asked the developers that own that specific part of the server. I found out my logic was faulty. I was wrong. It turns out that Smart Scan for IOT is simply not implemented. I’m not insinuating that means “not implemented yet” either. That isn’t the point of this blog entry. Neither is admitting I was wrong in my original answer to Tanel. There is more to this train of thought.

Will The List Of Smart Scan Compatible Objects Keep Growing And Growing?
Neither confessing how I shot off a hasty answer to Tanel, nor specifics about IOT Smart Scan support are the central points of this blog entry. So, just what is my agenda?  Primarily, I wanted to remind folks about the fundamental aspect of Exadata regarding indirection and Smart Scan (e.g., chained row, etc) and secondarily, I wanted to point out that the list of objects suitable for Smart Scan is limited for reasons other than feasibility. Time to market is important. I know that. If an object like IOT is not commonly used in the data warehousing use-case it is unnecessary work to implement support for Smart Scan. But therein lies the third hidden agenda item for this post which is to question our continual pondering over the list of objects that support Smart Scan.

Offload processing is a good thing. I wonder, is the goal to offload more and more?  Some is good, certainly more must be better in a scale-out solution. Could offload support grow to the point where Exadata nears a state of “total offload processing?”  Would that be a bad thing? Well,  “total offload processing” is, in fact, impossible since cells do not contain discrete segments of data but instead the scattering of data I wrote about above.  However, more  can be offloaded. The question is just how far does that go and what does it mean in architectural terms? Humor me for another moment in this “total offload processing” train of thought.

If, over time, “everything”—or even nearly “everything”—is offloaded to the Exadata Storage Servers there may be two problems. First, offloading more and more to the cells means the query-processing responsibility in the database grid is systematically reduced. What does that do to the architecture? Second, if the goal is to pursue offloading more and more, the eventual outcome gets dangerously close to “total offload processing.” But, is that really dangerous?

So let me ask: In this hypothetical state of “total offload processing” to Exadata Storage Servers (that do not share data by the way), isn’t the result a shared-nothing MPP?  Some time back I asked myself that very question and the answer I came up with put in motion a series of events leading to a significant change in my professional career. I’ll blog about that as soon as I can.

17 Responses to “Will Oracle Exadata Database Machine Eventually Support Offload Processing for Everything?”


  1. 1 Doug Burns March 20, 2011 at 9:50 pm

    Nice post Kevin. A couple of things I particularly liked.

    1) understanding fundamental workings will always trump memory of specifics. In this case Exadata, but many times I’ve been able to make an educated guess about how the Oracle RDBMS will handle a specific situation and then confirm it. Things change as well though.

    2) on offloading, I’ve pondered this a few times. If offloading eases one bottleneck, it probably ends up moving somewhere else. Then again, with processing power all over a potentially distributed system stack, there’s much more freedom to decide where to do things.

    I don’t profess to really understand the answers, but it’s a hell of a [lot] more complex and interesting IT world than I started in!

  2. 4 George March 24, 2011 at 8:58 am

    Been a while since you wrote such a thought provoking BLOG, with the changes in play looking forward to more.
    G

  3. 5 Luis Campos March 25, 2011 at 9:25 pm

    Keep it coming Kev. The world is waiting.

    LMC

  4. 6 Yibin Dong April 27, 2011 at 4:46 am

    Currently, from common practice point of view, cell offloading hurts more on some serial queries related to SYS schema. A query to get tablespace information used to respond within 3 seconds will return in 30 seconds after unnecessary offloading. Oracle probably will develop some “cell smart offloading” algorithm to selectively offloading queries that optimize overall system performance.

  5. 7 Donald April 27, 2011 at 11:29 pm

    Welcome to Teradata, Kevin. “Indirectly,” of course!

    • 8 kevinclosson April 28, 2011 at 3:14 pm

      Donald,

      Please elaborate.

      • 9 Donald April 28, 2011 at 11:17 pm

        “total offload processing,” as you refer to it in the MPP context, makes me think of Teradata as it existed in the early 1990’s. I should know, I’ve worked for Teradata for the last 11 years.

        Anyway, I was simply responding to your last paragraph, assuming the “significant change” in your professional career would inevitably lead you to Teradata.

        Hope I didn’t offend you by assuming too much. Anyway, I’m a new fan of yours. I appreciate your direct candor and on point discussion of the relevant issues.

        Keep it up, thanks, and I’ll learn from you!

        Donald

        • 10 kevinclosson April 29, 2011 at 1:10 am

          Wow, thanks for the kind words, Donald…

          That significant change put me in EMC as DCD Performance Architect… now..I need to get back to work…I’m analyzing IAS performance on a half-rack DCA at the moment 🙂

  6. 11 Uday May 11, 2011 at 2:13 pm

    From Shared Everything to Shared Nothing — Congrats Kevin

    One question: People seem to thing that the data returned to PGA can also be shared by other QUERIES ?

    • 12 kevinclosson May 11, 2011 at 5:14 pm

      Hi Uday,

      The difference between shared-disk and shared-nothing is not the principal difference. The principal difference is that Greenplum is a Symmetrical MPP and Exadata is an Asymmetrical MPP. I won’t be throwing FUD-flavored flame bombs at Exadata or any such craziness. I just want people to understand the differences and what it means to data flow.

      Data buffered in the PGA is not visible to any other process. PGA is private address space.

  7. 13 Amir Riaz May 18, 2011 at 6:55 pm

    I am quite interesting in your finding which leads you to switch you to greenplum

    • 14 kevinclosson May 18, 2011 at 8:22 pm

      Hello Amir,

      I will be blogging as soon as I can the technical reasons I wanted to vacate my post in Exadata product development to work on a product like Greenplum (and, indeed for a company like EMC). Between now and then I’ll also point out that there were “softer” reasons I left Oracle to join EMC. The most important of these “soft” reasons centers on company culture/values (customer focus, partners, etc). I think Rob Enderle is one of best of the very few mainstream IT press out there raising the eyebrow over Oracle’s moves (starting with the takeover of Sun). I recommend:

      http://www.itbusinessedge.com/cm/blogs/enderle/compare-and-contrast-emc-versus-oracle/?cs=46867

  8. 15 Amir Riaz May 18, 2011 at 8:53 pm

    Kevin,

    You are a good and hardworking person and I always like and respect you.


  1. 1 The Strategic Platform for ALL Database Workloads « flashdba Trackback on June 13, 2012 at 3:15 am

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 743 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.