I Can See Clearly Now. Exadata Is Better Than EMC Storage! I Have Seen The Slides! Part II. SuperCluster Storage?

BLOG UPDATE 2012.01.28: A lot has changed since this blog post so I need to point out that my mention herein about iDB ports to SPARC is clearly outdated. The production manifestation of the SPARC SuperCluster offers 6 Exadata Storage Servers in the full-rack configuration connected to the T4-4 hosts via the Exadata iDB protocol.

Preface
This blog entry is too long.

From Oracle Storage Strategy Update To TPC-C And Back (I Hope)
My recent blog entry entitled I Can See Clearly Now. Exadata Is Better Than EMC Storage! I Have Seen The Slides! Part I was pretty heavily read (over 7,000 views). I was concerned that blogging about something that happened two weeks ago might not be all that interesting. But, since my analysis (opinions) about the June 30, 2011 Oracle Storage Strategy webcast seems to resonate I thought I’d put out this installment.

What Do Transaction Processing Council Benchmarks Have To Do With The Oracle Storage Strategy Update?
I’ve been eagerly anticipating which of IBM or HP would be first to audit a TPC-C with the Xeon E7 (formerly Westmere EX) processor. These vendors have value-add systems componentry that properly extend the vanilla Xeon E7 + QPI capabilities to include scalable 8-socket and very large memory support.

IBM’s x3850 with MAX5 supports 96 32GB low-voltage DIMMS for a total of 3TB RAM with just 4 Sockets. IBM proved the strength of the x3850 several months ago with a 4-socket Nehalem EX (Xeon 7500) result of a little over 2.3 million TpmC. So, part of me was not all that surprised to find that they were able to stay with the recipe and publish a result of just over 3 million TpmC with the Xeon E7 processor and MAX5 chipset (July 11, 2011).  But that has nothing to do with the Oracle Storage Strategy webcast and, in fact, since it was a DB2 number with Linux it has very little to do with Oracle. So why am I blogging this?

While the 3 million TpmC result represents roughly 30% improvement over the Nehalem EX-based result for IBM, I’m saddened the entry was not an 8-socket result. Why? Well, I’ll put it this way. If IBM and HP can’t seem to make 8-socket Xeon boxes able to scale contentious workloads (like TPC-C) then it’s quite likely nobody can. It looks like 8-socket Xeon scalability is still out of reach for us. That is just too bad. But that has nothing to do with the Oracle Storage Strategy webcast. So why am I blogging this? I’m getting to it, trust me.

While perusing the main TPC-C all-results page I noticed three interesting things and one of them actually has to do with the Oracle Storage Strategy webcast!

The three things that caught my eye were:

  1. There are non-clustered Xeon results in the top ten! Sure, the prior IBM x3850 result was in the top ten but when it was published I didn’t catch on to that fact. It wasn’t too long ago that non-clustered x86 boxes were so far down the list as to not matter.
  2. In the ranks of the top-ten results there are two submissions that are less than $1.00/TpmC. I think that is quite significant when you compare to historical costs. Top ten TPC-C results with Xeon at < $1.00/TpmC—wow.
  3. None of the products mentioned in the Oracle Storage Strategy webcast appear in the top ten TPC-C nor TPC-H for that matter. The last Oracle TPC-H result was a 3TB scale M9000 result with Sun Storage 6000 (Sun Storage 6000 is LSI Engenio hardware and the Engenio brand is now owned by Netapp for what it’s worth).

So, obviously, point 3 in the list is what brings me back to the Oracle Storage Strategy Update June 29, 2011 (slides). If one publishes an industry benchmark that performs 3x over the closest competitor—as Oracle did with the SuperCluster 30 million TpmC result—wouldn’t the system (including storage) used to do so be considered a premiere system offering? One would think so—especially when the workload is an I/O intensive workload! But no, generally speaking the configurations used in TPC benchmarks are not to be confused with systems intended for production.

Concept Car or Production Car
The difference between TPC configurations and production configurations is a lot like the difference between a concept car and a car offered by the same manufacturer that is actually sitting on a lot with a price sticker on it. The concept car and the production car have a lot in common—but the differences are usually pretty obvious as well. We shouldn’t have a problem with this. I still think TPC benchmarks are good for certain purposes. An example of one such purpose is to see just how small the line is getting between the “concept car” and the “production car.”

SuperCluster Storage or Oracle Storage Strategy Line-up?
No, the “SuperCluster Storage” that was used for the 30 million TpmC result is not in the Storage Strategy line-up. So then what was the 30 million TpmC “concept car” storage? Take a peek at this link or let me summarize. The SuperCluster storage consisted of the following main ingredients:

  1. 97 Sun X4270M2 servers with one Intel Xeon removed. The 4270 servers ran Solaris and COMSTAR. As such, the servers play the role of “array heads” in order to perform protocol exchange between SAS and Fibre Channel. Why? Because the storage networking was Fibre Channel (108 8GFC Fibre Channel HBAs connecting the 27 Real Application Clusters nodes (4 HBAs each) to the COMSTAR heads and SAS from the COMSTAR heads to the storage.
  2. 138 Sun Storage F5100 Flash Array devices. That bit was $22,000,000. Remember the analogy about the concept car.

So a high-level schematic of the flow of data was F5100 SAS->COMSTAR head (SAS to FC)-> FC switches-> Sun T3-4 Servers. Don’t be alarmed by that many “hops” because they don’t really matter. Indeed, the 30 million TpmC SuperCluster delivered an average New Order response time of 0.35s, which is 69% faster than the IBM p780 result of 1.14 seconds.  That’s a point Oracle marketing pushes vigorously. Oracle marketing doesn’t, however, seem to push the fact that while HP was still Oracle’s premiere hardware partner they teamed with HP to deliver what was, at the time, a world record TPC-C using the recently-shunned Itanium processor. Moreover, they most certainly don’t push the fact that the circa-2007 Itanium TPC-C with Oracle10g delivered New Order average service times of 0.24s—which was 32% faster service times than the SuperCluster! Fine details matter.

Concept Car to Oracle Storage Strategy Update
No, there is no evolution from concept to reality where the COMSTAR+F5100 approach is concerned. In fact, Oracle spelled out quite clearly how the storage recipe for these SuperClusters will be “Sun ZFS Storage 7420” which means either FC, iSCSI or NFS—but no Exadata since there is no port of Exadata iDB to SPARC (as of the publish date of this article). I think the ZFS Storage Appliance is a reasonable product but I wouldn’t want to stick my arm in the unified storage meat-grinder with the likes of EMC VNX and Netapp.

So, no, the storage used for the SuperCluster TPC-C shows no promise at this time of evolving from concept to production. However, Oracle customers should be glad because yet another addition to the storage strategy would be all too confusing in my opinion.

Final Words About That IBM x3850 Xeon E7 TPC-C Result
The Oracle SuperCluster result of 30 million TpmC (.353s average New Order service time) didn’t beat out the service times of the ancient Itanium 2 based SuperDome New Order transactions, but at least it also failed to beat the IBM x3850 average service times!

The IBM x3850 pumped out over 3 million TpmC with average New Order service times of .272s and all that for $.59/TpmC. How? Well, the storage wasn’t a concept. The lion’s share of the I/O was serviced by 136 SFF SAS SSDs! That’s about 1/50th the cost for storage for 1/10th the transaction throughput when compared to the SuperCluster. And faster transaction service times too.

Intel Xeon is my concept car of choice—and you can run about any software you so choose on it so that makes it even better. And regardless of what software I chose to run I would rather it not be stored in “concept storage.”

Summary
This blog entry was too long.

15 Responses to “I Can See Clearly Now. Exadata Is Better Than EMC Storage! I Have Seen The Slides! Part II. SuperCluster Storage?”


  1. 1 DuncanE July 27, 2011 at 7:44 am

    >> I’m saddened the entry was not an 8-socket result

    I’m pretty sure you are aware of the scuttlebut that there _was_ an 8-socket Xeon TPC-C result run by HP (on Nehalem though, not E7), but Oracle wouldn’t let HP publish it… I guess this might not be the whole story, but an interesting data point:

    http://www.theregister.co.uk/2011/03/11/oracle_allegedly_stifles_hp_oracle_tpc_benchmark/

  2. 3 Steve Shaw July 28, 2011 at 9:43 am

    Hi Kevin,

    There is 8 socket Xeon OLTP data over at TPC-E with SQL Server on Nehalem and Westmere-EX. For example the Fujitsu Primergy RX900 S2 with the E7-8870s at 4555.54 tpsE. there is also 4 socket data here such as the IBM system x3860 X5 with the E7-4870s at 2862.61 tpsE. So this gives published TPC data of 4 to 8 socket scalability of 1.59X to start with.

    Cheers,

    Steve

  3. 5 Steve Shaw August 3, 2011 at 1:49 pm

    Kevin,

    I wouldn’t necessarily agree, for the first data available this is already a figure that would beat most 4 x 2 socket or 2 x 4 socket clustered alternatives to 8 socket. As you know you also need the OS and database to scale as well – there is some database software out there that struggles to scale going from 2 to 4 socket irrespective of platform and OS. So my view is this is not too bad at all, it shows that the OS and database does scale and I’m sure we’ll see more 8 socket data going forward.

    Cheers,

    Steve

  4. 7 George November 24, 2011 at 11:38 am

    Kevin,
    hmm a I missing something, ain’t the Sun Fire x4800 a 8 socket platform?
    G

  5. 11 Dmitri January 27, 2012 at 2:18 pm

    You mention Violin – I’ve been looking at the blogs of respected database performance experts but nobody seems to mention the consequences of the latest generation of flash storage arrays. Some (perhaps ill-advised) claims on the Violin website indicate that data can be read from “disk” faster than over the cluster interconnect on a RAC system. Whether that is true I don’t know, but if physical I/O can be an order of magnitude faster at sustained rates surely the world of database performance tuning must be turned upside down?
    I also note that Violin are making a big play on how they can compete with Exadata, EMC and IBM. For a small company their marketing department seems just as “committed” as the legendary Oracle marketing people (Unbreakable Linux anyone?)

    • 12 kevinclosson January 28, 2012 at 9:25 am

      Hi Dmitry,

      I have friends at Violin. They are good, honest people. I’m not aware of the comparison between a RAC cr-send and a “go read it yourself physical I/O” but the topic of such comparison is interesting. We know that the RAC inter-node communications library (skgxp) is implemented over several network protocols and physical networking technology ranging from 1GbE with UDP to RDS RDMA over Infiniband. Both a read from Violin and a cr-receive from a RAC instance measure in the microseconds but I sort of don’t care about that because the comparison is a bit moot. Violin accelerates both writes *and* reads so the mention of RAC cr-sends sort of falls through the cracks. In the same vein, EMC FAST accelerates *both* writes and reads.

      I don’t know when the trend started, but the rampant ignorance about the importance of writes in an OLTP/ERP environment makes me dizzy. If a platform can’t scale writes along with reads what good is it? This is one of my long-time issues with Exadata marketing and Oracle executive chest-thumping in particular. While they rightfully tout the fact that a full-rack Exadata X2 configuration can sustain about 1.5 million random single block reads (e.g., db file sequential read) they shamefully overlook the importance of the fact that it can only sustain 50,000 write IOPS. The 50,000 WIOPS is a gross capacity. ASM shaves on 50% of that with normal redundancy.

      So, if your application exhibits a read:write ratio of 60:1 then the nose-bleed Exadata read IOPS capacity is something you’ll benefit from. If your application exhibits a more realistic ratio of reads to writes like, say, 5:1 then the Exadata WPIO capacity acts like an shackle on the ankle and holds the application back to 125,000 RIOPS. I appreciate the value of headroom, but the massive disparity between read and write capacity makes the hype about 1.5 million IOPS intellectually dishonest.

      DISCLAIMER: Oracle Legal: Please don’t confuse yourselves over my mention of skgxp() and the Oracle wait event “db file sequential read” in this blog comment. Knowledge of the purpose libskgxp serves, nor the definition of “db file sequential read”, constitutes disclosure of confidential information.


  1. 1 Exadata: It’s The World’s Fastest Database Machine And The Best For Oracle Database – Part I. Do As I Say, Not As I Do! « Ukrainian Oracle User Group Trackback on August 6, 2011 at 11:14 am
  2. 2 Logical I/O Evolution « Ukrainian Oracle User Group Trackback on August 13, 2011 at 1:28 am
  3. 3 Oracle Executives Underestimate SPARC SuperCluster I/O Capability–By More Than 90 Percent! « Ukrainian Oracle User Group Trackback on November 9, 2011 at 3:05 pm

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s




EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,172 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 1,172 other followers

%d bloggers like this: