Recent SPARC T4-4 TPC-H Results Prove Oracle Can Do Better Than…Oracle! Part II.

My recent post entitled Recent SPARC T4-4 TPC-H Benchmark Results. Proving Bandwidth! But What Storage? provoked the following comment/question  from a reader:

Does this summarize your point(s)?

TPC-H produces a number which is a reflection of (hourly?!?) system throughput.

System throughput may not be indicative of system “performance” to its uses b/c users are typically most intersted in response time. Thus, TPC-H is a easily mis-used benchmark for comparing real world performance.

Oracle is using our misunderstanding of throughput as performnace to Market systems which are excellent throughput machines as excellent performance machines, when in fact their performance may be less then desirable.

I hope the reader took the time to read yesterday’s post entitled Recent SPARC T4-4 TPC-H Results Prove Oracle Can Do Better Than…Oracle as I think it goes a long way to address his comment/question. However, I do think the reader’s question deserves proper handling and thus I’m making this blog entry. So, dear reader, the following is my response to your comment/questions, but first I need to clear the air as it were.

There Is No Evil Lurking In This Thread
Let me first state categorically that Oracle is not “using our misunderstanding […] to Market systems […]” They are not doing anything under-handed with these TPC-H results.  They are, however, conveniently failing to compare their results to their own prior results. I only brought up the HP Proliant DL980 SQL Server results because Oracle did so in their press release.

Comparisons
I really do not like to compare TPC-H results across database vendor lines. The benchmark is too tricked out, it is a 3rd normal form schema and many other things about it make it just a goofy benchmark—if you have data warehousing in mind.  Nevertheless, comparisons between a given database vendor are useful for many purposes—such as suiting my ulterior motive which is to suggest that Oracle runs better on platforms other than their very own (recently acquired) SPARC processors.

Before I continue I’d like to interject a proclamation. In fact, I’ll quote myself if you’ll suffer me to do so:

Lack of published TPC-H results does not in any way disqualify any technology offering in the data warehousing space. There are no Oracle Exadata Database Machine  TPC-H results and that does not amount to a hill of beans. There are also no Teradata, EMC Greenplum  nor IBM Netezza results either and none of those beans form a hill.

– Kevin Closson

The point truly is that TPC-H does not reflect DW/BI/Big Data Analytics reality. However,  if a vendor like Oracle chooses to publish results then by all means I’m going to use those results to make my point—but only comparing Oracle’s own results. That’s precisely what I did I in my post entitled Recent SPARC T4-4 TPC-H Results Prove Oracle Can Do Better Than…Oracle.

Now, on to address the readers’ questions.

Throughput is a performance metric and a valid one indeed. However, throughput is generally derived by a concurrent workload of individual units of work that are individually measurable.  Consider disk throughput. If I tell you I have a storage configuration that satisfies, say, 500,000 I/O operations per second (IOPS) but don’t tell you the average service times I’m leaving out a critical piece of information.

How is the IOPS metric calculated? One samples I/O completions for a given period of time and then divides by the number of seconds sampled. It’s only measuring completions. If I have a tremendous number of I/O operations in flight concurrently, and sustained, I can get 500,000 IOPS even if the average completion time is 1 second.  They overlap.  The same goes for query workloads.

If you submit a continual, large stream of a variety of long running queries you get throughput.  Simply run such a hypothetical workload for a long time, sum up the completions and divide by sample period (time) and you get queries per unit of time.  Simple.

For example, if I have 10,000 concurrent queries requiring, on average, 61 minutes monitored for 2 hours I’ll get 10,000 completions or 5,000 queries per hour. So long as that meets my service requirement I’m fine.  However, if even one of my users mandates a 20 minute completion time I’m not going to impress with hand-waving over the great 5,000 QpH throughput I’m pushing through the system. Users really don’t care about how much work the system is doing on behalf of others. Do they?

So, to continue in this three-part series I’ll have to refer once again to the TPH-H disclosures (cited below).

I’ll refer again to the SPARC T4-4 result. If you glance at the report you’ll see that when submitted serially the geometric mean of query completion times is about 20 seconds on the SPARC T4. On the other hand, when we look at the HP BladeSystem result of over 3 years ago (still with Oracle Database 11g) we see that the geometric mean of serially submitted queries is nearly indiscernible…a mere blip. Of course the astute reader will point out that these comparisons—while both Oracle Database 11g—are that of in-memory versus disk-based (since the HP BladeSystem result was an In-Memory Parallel Query result). To that I would reply that it is foremost an old, tired Harpertown Xeon (5400) result with front-side bus technology compared to a state of the art, modern CPU (SPARC T4). And let’s not forget that the SPARC T4 server was connected to solid state storage!

It’s Not Fair Comparing Oracle In-Memory Parallel Query To Flash Storage
Really? Even considering how primitive a Harpertown Xeon was compared to a modern processor like SPARC T4? OK, fine. We can also harken back further to nearly 5 years to a result achieved by the now-defunct systems vendor called PANTA Systems. The PANTA System configuration, at the same 1TB scale, carried the following baggage:

  • Oracle Database 10g (with Real Application Clusters). So, old software.
  • Really, really old AMD Opteron 8000’s (very, very slow by today’s standards).
  • DDR400 DIMMs.

In spite of this aged bio, the configuration produced a geometric mean of 49 seconds for the serially submitted query stream compared to the 20 second result for the SPARC T4.

That’s a vintage 5 year old system, 10g versus 11g, AMD 8000 versus SPARC T4, DDR400 (not even DDR2) versus DDR3 memory and, lest I forget, the PANTA System memory controller was located across a front-side bus compared to the on-die SPARC memory controller. Tally up all of those contrasting system attributes and the resultant benefit to SPARC T4-4 is about 2.5-fold improvement in the geometric mean of query response times (serial). And, yes, time and technology did bring a a 7x increase in the throughput metric…but…once again, I encourage you to look at the disclosures I link to below and see how the completion times stack up in the throughput tests. If  you do so then we will have come full circle.

No, Oracle is not misleading anyone with these recent SPARC T4 results.

http://tpc.org/results/individual_results/Oracle/Oracle_T4-4_1TB_TPCH_ES_092611.pdf

http://tpc.org/results/individual_results/HP/HP_BladeSystem128P_090603_TPCH_ES_v2.pdf

http://tpc.org/results/individual_results/PANTA/PANTAmatrix_tpch_1TB_061019_es.pdf

Recent SPARC T4-4 TPC-H Results Prove Oracle Can Do Better Than…Oracle!

I made a blog entry yesterday entitled Recent SPARC T4-4 TPC-H Benchmark Results. Proving Bandwidth! But What Storage? wherein I discussed some recent Oracle SPARC T4 TPC-H benchmark results. I pointed out in the post that the T4-4 is an extreme high-bandwidth server as is evidenced by how closely it performs the same benchmark with only half the processors (sockets) as a recent HP Proliant DL980 result.  I then glued in some screen shots of the disclosure reports to elaborate on the point of bandwidth versus latency. You can push a lot of work through a SPARC T4-4, but that doesn’t mean each individual unit of work is all that fast—relatively speaking.  This was even more so the case with the T3 platform before it.

Single stream Oracle workloads were horrible on the T3 platform, but as one scaled up the workload one could find near-parity between T3 and even Nehalem EP (as per my personal testing). That parity, mind you, is on a socket-for-socket basis.

Lest anyone think I’m being flamboyant regarding my comments on single-stream T3 Oracle performance, just talk to anyone that has ever run the Oracle imp command to import data into a database on a SPARC T3 system.  Miserable, and only one example of the sort of single-stream workloads that didn’t shine on the T3.  But that isn’t what I’m blogging about.

Reader Feedback
I received several emails from readers asking for small clarifications regarding yesterday’s blog entry. They were pretty light questions so I answered them. I also got an email with what I refer to as a passively aggressive interrogative assertion:

Can’t you make valid comparisons?

The answer to that would be, yes, of course. That’s what I did. The comparison I made was between HP Proliant DL980 with SQL Server and the Sun SPARC T4 with, of course, Oracle Database 11g in the same scale factor TPC-H.  That’s a valid comparison. I’d ordinarily just reply to such an email with a convenient URL to the tpc.org website because the information is all there. However, I gave it some thought and decided I should post a follow-up so regular readers don’t think I’m reaching for straws on a comparison.

So, please put your sarcasm meter on when you read the next sentence. Maybe I should show a comparison between two relatively similar results. The similarities are:

  1. Both SPARC
  2. Both Solaris 10
  3. Both Oracle Database 11g (the same bits)
  4. The same scale
  5. The same storage!
  6. The same calendar year (within close to 3 months of each other)
  7. Within 4% in QphH terms

The following screen shots are SPARC Enterprise M8000  versus SPARC T4-4:

SPARC T4-4:

M800:

The SPARC results are quite similar. Maybe that’s just how Oracle Database behaves at the 1TB scale? No, it’s not.

Can Oracle Do Better Than…Oracle? Yes.
We can harken back to a couple of years to find an Oracle Database 11g result that looks dramatically different. I’m referring to the last audited TPC benchmark conducted with Oracle in partnership with HP. The benchmark was a large blade cluster at the 1TB scale with Oracle Database 11g In-Memory Parallel Query. Sure, the configuration was much larger and costlier but did it perform accordingly? Yes.

The following is a link to the disclosure. http://tpc.org/tpch/results/tpch_result_detail.asp?id=109060301

The cost of the system was about 7x more than the recent 1TB SPARC T4 (with all flash storage) result and delivered just short of 6x the throughput. When you glance at the following screen shot characterizing the query completion times you’ll understand when I suggest that, yes, Oracle probably can do better than…Oracle (SPARC that is).

Recent SPARC T4-4 TPC-H Benchmark Results. Proving Bandwidth! But What Storage?

On 30 November, 2011 Oracle published the second result in a recent series of TPC-H benchmarks. The prior result was a 1000GB scale result with a single SPARC T4-4 connected to 4 Sun Storage F5100 Flash Arrays configured as direct attached storage (DAS).  We can ascertain the DAS aspect by reading the disclosure report where we see there were 16 SAS host bus adaptors in the T4-4. As an aside, I’d like to point out that the F5100 is “headless” which means in order to provision Real Application Clusters storage one must “front” the device with a protocol head (e.g., COMSTAR) such as Oracle does when running TPC-C with the SPARC SuperCluster. I wrote about that style of storage presentation in one of my recent posts about SPARC SuperCluster. It’s a complex approach, is not a product, but it works.

The more recent result, published on 30 November, was a 3000TB scale result with a single SPARC T4-4 server and, again, the storage was DAS. However, this particular benchmark used Sun Storage 2540-M2 (OEMed storage from LSI or Netapp?) attached with Fibre Channel. As per the disclosure report there were 12 8GFC FC HBAs (dual port) for a maximum read bandwidth of 19.2GB/s (24 x 800MB/s). The gross capacity of the storage was 45,600GB which racked up entirely in a single 42U rack.

So What Is My Take On All This?

Shortly after this 3TB result went public I got an email from a reader wondering if I intended to blog about the fact that Oracle did not use Exadata in this benchmark. I replied that I am not going to blog that point because while TPC-H is an interesting workload it is not a proper DW/BI workload. I’ve blogged about that fact many times in the past. The lack of Exadata TPC benchmarks is in itself a non-story.

What I do appreciate gleaning from these results is information about the configurations and, when offered, any public statements about I/O bandwidth achieved by the configuration.  Oracle’s press release on the benchmark specifically called out the bandwidth achieved by the SPARC T4-4 as it scanned the conventional storage via 24 8GFC paths. As the following screen shot of the press release shows, Oracle states that the single-rack of conventional storage achieved 17 GB/s.

Oracle Press Release: 17 GB/s Conventional Storage Bandwidth.

I could be wrong on the matter, but I don’t believe the Sun Storage 2540 supports 16GFC Fibre Channel yet. If it had, the T4-4 could have gotten away with as few as 6 dual-port HBAs. It is my opinion that 24 paths is a bit cumbersome. However, since it wasn’t a Real Application Clusters configuration, the storage network topology even with 24 paths would be doable by mere mortals. But, again, I’d rather have a single rack of storage with a measly 12 FC paths for 17 GB/s and since 16GFC is state of the art that is likely how a fresh IT deployment of similar technology would transpire.

SPARC T4-4 Bandwidth

I do not doubt Oracle’s 17GB/s measurement in the 3TB result. The fact is, I am quite astounded that the T4-4 has the internal bandwidth to deal with 17GB/s data flow. That’s 4.25GB/s of application data flow per socket. Simply put, the T4-4 is a very high-bandwidth server. In fact, when we consider the recent 1T result the T4-4 came within about 8% of the HP Proliant DL980 G7 with 8 Xeon E7 sockets and their PREMA chipset . Yes, within 8% (QphH) of 8 Xeon E7 sockets with just 4 T4 sockets. But is bandwidth everything?

The T4 architecture favors highly-threaded workloads just like the T3 before it. This attribute of the T4 is evident in the disclosure reports as well. Consider, for instance, that the 1TB SPARC T4 test was conducted with 128 query streams whereas the HP Proliant DL980 case used 7. The disparity in query response times between these two configurations running the same scale test is quite dramatic as the following screen shots of the disclosure reports show. With the HP DL980, only query 18 required more than 300 seconds of processing whereas not a single query on the SPARC T4 finished in less than 1200 seconds.

DL980:

SPARC T4:

Summary

These recent SPARC T4-4  TPC result proved several things:

1.    Conventional Storage Is Not Dead. Achieving 17GB/s from storage with limited cabling is nothing to sneeze at.

2.    Modern servers have a lot of bandwidth.

3.    There is a vast difference between a big machine and a fast machine. The SPARC T4 is a big (bandwidth) system.

Finally, I did not blog about the fact that the SPARC T4 TPC-H benchmarks do not leverage Exadata storage. Why? Because it simply doesn’t matter. TPC-H is not a suitable test for a system like Exadata. Feel free to Google the matter…you’ll likely find some of my other writings stating the same.

Next Page »


EMC Employee Disclaimer

The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.

This disclaimer was put into place on March 23, 2011.

Oracle ACE Program Status

Click It

website metrics

My Tweets


Follow

Get every new post delivered to your Inbox.

Join 282 other followers