Oracle Announces the World’s Second OLTP Machine. Public Disclosure Of Exadata Futures With Write-Back Flash Cache. That’s a Sneak Peek At OOW 2012 Big News.

The Enkitec Extreme Exadata Expo is well underway having just brought day 1 to a close with a keynote from Oracle’s Senior Vice President of Database Server Technology, Andy Mendelsohn.

Mr. Mendelson ended his presentation with a slide of futures for Exadata. Frits Hoogland tweeted the slide here.

I too was listening in on the presentation as a virtual attendee. I heard Mr. Mendelsohn state that the features fall into the “within 12 months” category. I suppose that means coinciding with Oracle Database 12.2 (note, dot-2). I could certainly be wrong on that matter though. Perhaps 12.1. We’ll see.

Of the items on the list I’d say the most interesting was the “Flash for all writes” item. The bullet point is enhanced with a pledge of 10x more writes for write-heavy OLTP. I knew of this feature and its eventual release, but now the information is public.

As my many posts on the matter attest, I have been critical of Oracle referring to Exadata as the “World’s First OLTP Machine” simply because it has read-cache. OLTP requires the scaling of writes along with reads. However, according to Oracle’s Exadata datasheets, the “World’s First OLTP Machine” is currently capable of 1.5 million read IOPS in a full-rack X2 configuration but only 50,000 gross random writes. With normal ASM redundancy the gross is reduced to a net WIOPS capacity of approximately 25,000 or a read:write ratio of 60:1. Many users feel compelled to use ASM high redundancy for reasons outside the scope of this post.

An increase in 10x would be 500,000 WIOPS (gross) which, being a write-back cache, will have to be de-staged back to spinning media at some point.

It will be interesting to see how this feature plays out. On first glance it would appear as though the goal is to support a read:write ratio of 6:1 which is drastically better than the World’s First OLTP Machine. If the World’s Second OLTP Machine can handle the de-staging from cache back to spinning media I’ll say kudos. Until we know, we can only guess.

Personally, I’m putting my bets on completely non-mechanical approaches for solving write-intensive OLTP problems. That’s another way of saying total flash storage.  However, I’d also stack up auto-tiering (e.g., EMC FAST) against this approach because the de-staging requirement is reduced as blocks are promoted to Enterprise Flash Drives. I have not performed any benchmark to back that viewpoint…but then I suspect a feature intended to see light of day “within 12 months” is getting much benchmark action within Oracle development either. There again, I could be wrong because I no longer work there.

What Are People Buying This Stuff For?
Since the vast majority of Exadata sales are quarter-rack configurations deployed for applications other than Data Warehousing I think Oracle is focusing on the correct weaknesses.

If you’re interested, I spoke of closely related topics in my recent interview in the August edition of the Northern California Oracle User Groups journal.

Finally, if you haven’t seen my video presentations on the fundamental architectural reasons Exadata is inferior to other solutions in the marketplace for the DW/BI/Analytics use case, I recommend you give them a viewing. Doing so might help you understand why Oracle sells more Exadata for use cases other than DW/BI and, in turn, why they focus on features that have nothing to do with that use case–like write-back flash cache.

26 Responses to “Oracle Announces the World’s Second OLTP Machine. Public Disclosure Of Exadata Futures With Write-Back Flash Cache. That’s a Sneak Peek At OOW 2012 Big News.”


  1. 1 hrishy August 13, 2012 at 8:51 pm

    Hi Kevin

    I was reading your comments on Expert Exadata on page 513.
    And i quote the comment
    There are some systems that just bang away at single-row inserts. These systems are often limited by the speed at which commits can be done,
    which often depends on the speed with which writes to the log files can be accomplished.
    This is one area where Exadata competes with other platforms on a fairly even playing field

    So has the Flash For All Writes already tilted the field in favor of exadata or still for the workloads you mentioned below In Memory databases like coherence or Times ten still the way to go.

    • 2 kevinclosson August 13, 2012 at 9:51 pm

      Very good question. Actually, if you take a close look you’ll notice that specific quote is sandwiched between two “Kevin Says.” I’ll address your question and tweet the fact so that perhaps Tanel or Kerry can follow up.

      What this specific bit in the book is actually saying is that (at the time the book was written) there was no enhancement whatsoever in Exadata to improve redo log writes. Redo writes are simple sequential large writes and, honestly, 15K RPM SAS drives are quite apt at handling such an I/O profile. That did not stop Oracle from releasing the Exadata Smart Flash Log feature which may or may not add much value since the redo writes are being sent to both spinning media and flush concurrently.

      The “Flash For All Writes” Andy Mendelsohn spoke of today at E4 has to do with addressing DBWR flushes.

      The only “tilting of the field in Exadata favor” is the great speed at which it can perform light scans (e.g., count(*) without predicates) on data that fits in the Exadata Smart Flash Cache (5TB on a full-rack X2 configuration. If one routinely executes queries that search for non-existent data (i.e., fraud detection perhaps) then the datasheet scan rates will be realized. Complex query and/or concurrent query on the other hand throttles Exadata scan rates down to a fraction of the datasheet number. I elaborated on that point in my Critical Thinking Video. And when I say “fraction of the datasheet” sort of scan throughput I mean Exadata gets throttled down to the sorts of physical scan I/O that is quite easy to satisfy with modern conventional storage. See the Critical Thinking Videos on that matter.

      In summary, if Oracle implements a write-back cache it will likely not perform as well as a total flash storage solution, or as I mention in this post even auto-tiering such as EMC FAST is a better solution than fronting disks with cache…because, after all, the write-back has to occur sometime.

      Since the feature doesn’t exist yet this is all just a conversation about concepts really…

  2. 3 George August 13, 2012 at 11:06 pm

    I’m interested in the new feature listed Virtualization, especially for hosting companies or companies with major my sand box your sand box departments, where they don’t work together, since this will provide a better multi tenancy solution, 12 months, wow, thats before OOW 2013.

    G

    • 4 kevinclosson August 14, 2012 at 8:02 am

      I agree. The mention of virtualization is intriguing. We’ll have to wait to see what that word actually means though.

      • 5 flashdba August 17, 2012 at 2:49 am

        This is of course wild speculation but I wonder if there will be any consideration of running OVM on the compute nodes? Larry Ellison made a lot of disparaging remarks about the Salesforce.com security model being multi-tenancy and therefore being a “roach motel”.

        According to Larry, with the Oracle On-Demand approach “your data’s in a separate database because it’s virtualized”, whereas Salesforce “puts your data at risk by commingling it with others”.

        This always struck me as being disingenuous given that Oracle’s only “strategic” database product doesn’t support any level of virtualization.

        Meanwhile, Oracle is getting absolutely tanked in the virtualization market by VMware – and now, with the release of products like EMC’s vFabric Data Director, time is running out for Oracle to build a market around Oracle VM.

  3. 7 Ofir August 14, 2012 at 11:22 pm

    The fun pre-OW guessing game…
    As far as I remember, Oracle always uses a 12-month granuality when talking about future releases. So, Andy can’t go on record saying next release of Exadata to be in next quarter, he must say next 12 months (he is allowed to wink though, if it is imminent). So, you should assume anything he has a slide for is 12gr1…
    Also, since OW is next month, this is obviously from his OW slide deck (you saw the public rehearsal), it makes no sense to steal the thunder out of 12g announcements for a 12gr2 feature.
    Anyway, if he did hint that a specific feature is not imminent, that probably means that feature has slipped to the first patchset of 12gr1.
    Of course, this is my wild interpretation of a session I didn’t even attend 🙂
    BTW – it makes a lot of sense for Oracle to shift its Exadata attention to OLTP. First, this is were most systems are. Second, they believe it is a marketing advantage (“one ring to rule them all”), instead of building a dedicated, pure flash appliance for OLTP systems (which are mostly pretty small on disk but high on CPU and IOPS). Third, they do have a new competitor that come out of the blue and aims to have #2 market share in five years… And I think they should take Hana seriously (although I don’t expect them to admit it for a while). So, they have a year before Hana is ready for R/3 systems, they should use this window of opportunity before it closes.

  4. 8 George August 15, 2012 at 9:32 pm

    Hi Kev
    I’m keen to see if they’re going to up the processing power in the Exadata Storage servers for ED, of course this might happen as a silent upgrade/improvement, like what happened with 5670 -> 5675 on compute nodes a while back.

  5. 11 vnmaster August 17, 2012 at 5:56 am

    Hi Kevin, Will flash logging and Write-Back Flash Cache help Mixed environments(OLTP + DW)?

  6. 14 Anantha August 19, 2012 at 6:49 pm

    @Kevin, I’ll digress here for a moment to give our experience on destaging. We’ve been using the ZFS storage appliance for running VMWare. We’ve over 1,500 VMs banging away at the appliance and even a very small (18GB x 4) write optimized SSD (SLC) logzilla works extremely well. We routinely push 800MB/s I/O and 50-100,000 NFS v3 OPS; lot of writes since many of the VMs are running databases. Our NFS services times are measured in <40ms for 99.9% of the workload, <15ms for 99% and <2ms for 95%. I very seldom observe more than 500MB-1GB of the logzilla used. So the destaging works very well on the appliance and we've 160 SATA 7,200 RPM disks.

    The ZFS destaging is different from ASM/database. I hope the engineering teams have leveraged each other's experience to build a good solution.

    I'm very excited about the 'all writes to flash', heard about it from others as well. I'll wait to believe the 10x improvement but won't be surprised if it does based on our ZFS appliance experience, I know it is possible. If it even gets me 250,000 IOPS (50% derating) then I'm in good shape.

    • 15 kevinclosson August 20, 2012 at 8:56 am

      Anantha : Interesting perspective. Would you share your read:write ratio and redundancy level in this ZFS environment?

      • 16 Anantha August 20, 2012 at 2:41 pm

        Like most VM environments our workload is very write intensive. I just took a look at the Analytics for the last 24 hours and it is >80% write. I can attach a 24 hour chart but don’t know how in the comments section.

        We are running mirrored ZFS pool. The 4 x 18GB SSD write-optimized Logzillas on an average sustain >10x IOPS compared to my SATA drives. I’ve observed as much as 10,000 ops, in aggregate, over the 4 logzillas. The SATA drives seldom leave their comfort zone of 40-65 IOPS. Most of NFS operations size is 128KB.

        • 17 kevinclosson August 20, 2012 at 2:48 pm

          Anantha,

          Interesting. So your 160 SATA drives are sustaining write-back destaging at a rate of about 52 WIOPS/drives.

          You are right that this sort of write-back destaging is different than the problem to be solved in Exadata. We’ll see how all that turns out.

  7. 18 Anantha August 20, 2012 at 5:35 pm

    @Kevin, for whatever reason there isn’t a ‘reply’ button in your last comment. Anyhow, Remember the destaging happens every 5 seconds. During the destaging all the disks spike up to around 300+ IOPS and calm back down to 50.

    The truism is, at the end of the day data has to make its way down to the spinning disk. One can always have/imagine workloads that’ll overflow the flash, no matter it’s size. If they can deliver 10x improvements then it is outstanding.

    In 40 days we’ll know.

    • 19 kevinclosson August 21, 2012 at 1:11 pm

      @Anantha : Yep… you are right on all accounts! Hold it, 300 IOPS to SATA HDD? your 160 SATA drives must be near empty (short stroked) ?

      • 20 Anantha August 21, 2012 at 7:18 pm

        It is not at all short-stroked, in fact there are no tunables with ZFS that allows for that. Our pool utilization right now is around 40%. The reason the SATA drives can do 300 IOPS (I’ve even seen >500 IOPS) is because it is a streaming write. Remember ZFS is a COW filesystem, so all writes can be streaming as long as there are contiguous free space. That’s where the Exadata storage is different, Oracle database is not a COW in its space management. Hence my reluctance to believe the 10x until I see it.

  8. 22 Fabrizio October 2, 2012 at 1:03 am

    Hi Kevin, is the write back SSD flash , what they spoke about yesterday at OpenWorld?
    Is this being anticipated or it is something different? Have you have the opportunity to look into that?

    Thank you so much
    Fabrizio


  1. 1 Exadata Roadmap Preview « flashdba Trackback on August 17, 2012 at 10:14 am
  2. 2 Exadata Flash Write-back – Sooner Than We Think? « Oracle-Ninja.com Trackback on August 25, 2012 at 6:58 pm

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 743 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.