Marketing Efforts Prove SunFire T2000 Is Not Fit For Oracle.

Published November 30, 2006 oracle 27 Comments

I’ll try not to make a habit of referencing a comment on my blog as subject matter for a new post, but this one is worth it. One of my blog readers posted this comment. The question posed was what performance effect there would be with DSS or OLTP with the Sun CoolThreads architecture—given it has a single FPU shared by 8 cores. The comment was:

I have heard though that the CoolThread processors are not always great at supporting databases because they only have a single floating point processor? Would you see this as a problem in either a OLTP or DSS environment that don’t have any requirement for calculations that may involve floating points?

Since this is an Oracle blog, I’ll address this with an Oracle-oriented answer. I think most of you know how much I dislike red herring marketing techniques, so I’ll point out that there has been a good deal of web- FUD about the fact that the 8-core packaging of the CoolThreads architecture all share a single floating point unit (FPU). Who cares?

Oracle and Floating Point
The core Oracle kernel does not, by and large, use floating point operations. There are some floating point ops in layers that don’t execute at high frequency and therefore are not of interest. Let’s stick to the things that happen thousands or tens of thousands of times per second (e.g., buffer gets, latching, etc, etc). And, yes there are a couple of new 10g native float datatypes (e.g., BINARY_FLOAT, BINARY_DOUBLE), but how arithmetic operations are performed on these are a porting decision. That is, the team that ports Oracle to a given architecture must choose whether the handling of data types is done with floating point operations or not. Oracle documentation on the matter states:

The BINARY_FLOAT and BINARY_DOUBLE types can use native hardware arithmetic instructions…

Having a background in port-level engineering of Oracle, I’ll point out that the word “can” in this context is very important. I have a query out about whether the Solaris ports do indeed do this, but what is the real impact either way?

At first glance one expect that an operation like select sum(amt_sold) to benefit significantly if the amt_sold column was defined as a BINARY_FLOAT or BINARY_DOUBLE, but that is just not so. Oracle documentation is right to point out that machine floating point types are, uh, not the best option for financial data. The documentation reads further:

These types do not always represent fractional values precisely, and handle rounding differently than the NUMBER types. These types are less suitable for financial code where accuracy is critical.

So those folks out there that are trying to market against CoolThreads based largely on its lack of good FPU support can forget the angle of poor database performance. It is a red herring. Well, ok, maybe there is an application out there that is not financial and would like to benefit from the fact that a BINARY_FLOAT is 4 bytes of storage whereas a NUMBER is 21 bytes. But there again I would have to see real numbers from a real test to believe there is any benefit. Why? Remember that accesses to a row with a BINARY_FLOAT column is prefaced with quite a bit of SGA code that is entirely integer. Not to mention the fact that it is unlikely a table would only have that column in it. All the other adjacent columns add overhead in the caching and fetching of this nice, new small BINARY_FLOAT column. All the layers of code to parse the query, to construct the plan, to allocate heaps and so on are mostly integer operations. Then to access each row piece in each block is laden with cache gets/misses (logical I/O) and necessary physical I/O. For each potential hardware FPU operating on a BINARY_FLOAT column there are orders of magnitude more integer operations.

All that “theory” aside, it is entirely possible to actually measure before we mangle as goes the cliché. Once again, thanks to my old friend Glenn Fawcett for a pointer to a toolkit for measuring floating point operations.

Why the Passion?
I remember the FUD marketing that competitors tried to use against Sequent when the infamous Pentium FDIV bug was found. That bug had no affect on the Sequent port of Oracle. It seems that was a subtle fact that the marketing personnel working for our competitors missed because they went wild with it. See, at the time Sequent was an Oracle server power house with systems based deep at the core with Intel processors (envision a 9 square foot board loaded with ASICS, PALs and other goodies with a little Pentium dot in the middle). Sequent was the development platform for Unix Oracle Parallel Server and Intra-node Parallel Query. Oracle ran their entire datacenter on Sequent Symmetry systems at the time (picture 100+ refrigerator sized chassis lined up in rows at Redwood Shores) and Oracle Server Technologies ran their nightly regression testing against Sequent Symmetry systems as well. Boring, I know. But I was in Oracle Advanced Engineering at the time and I didn’t appreciate the FUD marketing that our competitors (whose systems were RISC based) tried to play up with the supposed impact of that bug on Oracle performance on Sequent gear. I do not like FUD marketing. If you are a regular reader of my blog I bet you know what other current FUD marketing I particularly dislike.

More to the point of CoolThreads, I’ve seen web content from companies using what I consider to be red herring marketing against the SunFire T[12]000 family of servers. I am probably one of the biggest proponents of fair play out there and suggesting CoolThreads technology is not fit for Oracle due to poor FPU support is just not right. Now, does that mean I’d choose a SunFire over an industry standard server? Well, that would be another blog entry.

27 Responses to “Marketing Efforts Prove SunFire T2000 Is Not Fit For Oracle.”

Feed for this Entry Trackback Address

1 amit poddar November 30, 2006 at 10:06 pm

Hi,

This will sound stupid.

If oracle does not do any floating point operations
then how doew oracle do operations on NUMBER(10,2) columns ?

thanks
amit

Reply
2 kevinclosson November 30, 2006 at 10:26 pm

No, not stupid at all…

BCD… pretty much like all the other RDBMS engines …
http://download-west.oracle.com/docs/cd/B19306_01/appdev.102/b14261/datatypes.htm

Reply
3 Glenn Fawcett December 1, 2006 at 9:55 pm

Nice post 🙂

Most customers that I run into running on the T2000 with Oracle absolutely love it. You get 8 cores and 32 treads with the throughput of a E10K in a pizza box… AND you only have to pay for 2 Oracle licenses…. No, I didn’t stutter. Massive throughput and only 2 Oracle licenses.

Reply
4 UnixGuru December 11, 2006 at 6:02 pm

As a consultant, I have seen this little T2000 box do well with OLTP workloads on Oracle. Its designed for doing lots of things at the same time, like Oracle, Apache, and many of the J2EE App Servers out there. Not mention you pay Oracle less $$ for licenses.

I have seen many of these take hold in large companies with large web presences.

Reply
5 Jit Biswas January 17, 2007 at 8:17 pm

Good blog! To all those people trying to adopt Linux the T2000 is a very good choice for both Application Server and Database Server. You pay a lot less in hardwar and licensing costs too. Recntly I was at a customer site (a very big tax software company!) and they are replacing almost everything they have including bigger boxes to T2000 (except for the really big ERP database which resides on a Sun F15K) and they’re happy with all the benchmarking they have done!

Reply
6 Leonid January 21, 2007 at 3:49 am

For those who claim Oracle does not perform on these machines:
Sun E6500 with 28x400MHz USII (8M L2 cache) with 28GB of RAM running E-Business Suite payroll module process (Generate Run Balances) takes 35% more time compared to T2000 with UST1 (1GHz, 8 cores) with 16GB of RAM. Everything else is the same (EMC Clariion 4700 with 40 drives). For those who don’t know what E6500 is and what size it is, there is an answer: go to the kitchen, find your biggest appliance there and make it 12000 BTU/hr heat producting appliance instead of cooling appliance.

Reply
7 Alison April 13, 2007 at 1:47 pm

I just want to say thanks for taking the time to post this, I found this very interesting indeed.

Reply
8 Jimbob May 24, 2007 at 12:39 pm

This is good stuff. Any more contractors out there with anecdotes one way or the other. Empirical evidence helps us sleep at night !

Reply
9 mbar June 4, 2007 at 8:57 am

Love the idea of a T2000 running Oracle, but our experiences are somewhat disappointing. Does anyone have any hints/tips/links that give a good account of how to tune Oracle on T2000s?

Reply
10 Glenn Fawcett June 4, 2007 at 7:45 pm

Below are some links to resources and an email alias designed to help Sun customers of the “try-and-buy” program. Keep in mind, that the T2000 is a throughput based machine as this blog points out. If your application has too much serial like a single threaded batch job, then performance will not match traditional architectures. This box shines when you load it up.

http://www.sun.com/servers/coolthreads/tnb/applications_oracle.jsp
http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ

Finally, feel free to drop me a note “Glenn.Fawcett@Sun.com” or use the our T2000 experts alias “External_T2000_Experts@sun.com

take care,
Glenn

Reply
11 Tim August 21, 2007 at 10:05 pm

Interesting.
I have deployed a large J2EE provisioning app on T2000, using 10g backend, also on T2000’s.
Performance is pretty poor, as this is essentially a sequential process engine. Yes, I can run many of these in parallel without upsetting the T2k’s but we’re primarily interested in end to end response time, not the number we can run concurrently.
I shifted all this (db and app) to two Sun AMD64 boxes running good old SuSE9, and saw a 5 fold performance improvement.
Still more investigation to do, (eg) there could be issue with the T2000’s san connection, but simple indicators such as jboss boot time ( 40 secs on AMD vs: 3:30 on T2000) suggest not all is right with the T2k.

Reply
12 kevinclosson August 21, 2007 at 10:13 pm

Tim,

Very interesting feedback. I don’t know enough about the J2EE layer to know how demanding it is on floating point, but that would serialize at least that layer of your stack…the Database really should not perform that poorly on T2000…maybe Glenn Fawcett will chime in…Glenn?

Reply
13 Glenn Fawcett August 21, 2007 at 11:03 pm

Yep, JBOSS “boot” time will be better on an AMD chip since is it is a single-threaded process running a 3x faster clock with a large cpu cache. If ALL you want to do is boot the system and do ONE thing at a time fast, then the AMD64 box is a better choice.

The T2000 is about throughput. After you get past the boot, how did it perform? How many threads? What was the TPS? I recently wrote an entry my blog regarding the throughput of the T2000.

http://blogs.sun.com/glennf/entry/getting_past_go_with_sparc

Hope this helps,
Glenn

Reply
14 Tim September 21, 2007 at 3:27 am

Hi Guys

Yes, I agree, the T2000 is all about throughput. The whitepapers are very silent on the simple fact that although you can run 1000 things in parallel and get amazing throughput, sadly each request will take an appalingly long time to complete ;-).

I’ve come across the T2K at a couple of customer sites now, and in real world (vs: test lab or hypothetical) scenarios, the T2K’s appear to be struggling to meet the expectations of the business.
Whether customers have been led to expect V8xx performance for a V2xx price (which indeed appears to be the case) will doubtless come out in the fullness of time.

My personal belief is that vendors and developers would have to be convinced there was a ground swell of movement in this direction and fundamentally change their coding practices to suit for this kind of architecture to be of any use.

The reality appears to be that the ground swell is in the other direction towards cheaper and faster alternatives…

Reply
15 Bruce November 2, 2007 at 4:25 pm

Tim,

I too have seen several people disappointed with T2000 performance, at least initially.

Most of the time there is no need to change their coding practices though… it is usually a matter of changing their testing methodology.

The T2000s CPU was created because most real world loads have high clock rate (Intel, AMD, Power, SPARC) CPUs spending only 20% of their time running full speed and 80% of their time waiting for memory.

Many people benchmark machines with small data sets that tend to be representative of real world loads only when comparing traditional architecture (high clock rate) chips. If your benchmark, even heavily multithreaded, has the CPUs operating largely from within cache, traditional architecture CPUs are going to outperform the T2000 by potentially large margins due to higher clockrates and deeper pipelines. With no cache misses, the T2000 operates like 8 slow traditional architecture cores, but with a high cache miss rate, it operates as well or sometimes better than 32 fast traditional architecture cores.

I’ve seen the likes of a Java benchmark that showed 8 Opteron cores outperform an 8 core (32 thread) T2000 by 25% when using their benchmark data set of 500MB; when told their real world data size was 20GB, we asked them to try that. They ended up trying a 2GB data size as a compromize and now the T2000 was *outperforming* those same 8 Opteron cores by 30%. I’ve seen JBOSS app server benchmarks where a 200 user simulated load shows the T2000 to be 10% slower than an 8×1.8Ghz UltraSPARC IV+ (V490) machine…but the same benchmark with 400 simulated users shows the T2000 to be 25% faster.

You do have to load the T2000 up to see it shine. The good news is most multithreaded server applications these days are perfectly capable of taking advantage of the architecture with *no* coding changes. Just beware of artificially small data set benchmarks that do not properly reflect the capabilities of the T2000.

Though the T2000 starts out single-threaded, small data set loads far behind traditional architecture multiprocessor systems, as the concurrency and data sizes both increase (as is inevitable in most real world installations), the T2000 degrades an order of magnitude “more gracefully” than traditional CPU architectures.

Cheers,

Bruce

Reply
16 Joe Foobar March 11, 2009 at 9:32 pm

The t2000 is a nightmare.
Let’s compare a 4CPU (1.6ghz) 440 vs. 4 core T5220 (1.1ghz) and a single CPU x4150 (all running solaris, same release of oracle)
and compare apples to apples for a moment.
Jump-start Solaris on a Netra 440 , T5220 and X4150, and you’ll see the difference.
Install Oracle 10 on the same platforms.. same thing.
Database creation (tablespace time is negligible) N440 – ~30 minutes. T5220 – 1 hour – X4150 10 minutes.

Create a table (make is 100 char length) (I/O subsystem being the same). 1 pk, no FK. Write a simple script that can be simulate parallel jobs.
Up to 4 jobs, 440(1600 inserts per second) is blowing away the 5220 at 615, the intel is @ 5000.
The 440 maxes out after 4 jobs in parallel (expected), you will need 12 jobs on the 5220 to match the 440. The intel maxes out @ 34 jobs.. for an aggregate throughput of 13,000 inserts per second..
T5220 – $12,000
X4150 – $4,000

Reply
- 17 kevinclosson March 11, 2009 at 11:14 pm
  
  Interesting feedback, Joe. I wonder if others will chime in.
  
  Reply
  - 18 kevinclosson March 11, 2009 at 11:35 pm
    
    I just sent email to my old friend Glenn Fawcett to see if he can chime in…
    
    Reply
  - 19 kevinclosson March 11, 2009 at 11:35 pm
    
    I just sent email to my old friend Glenn Fawcett to see if he can chime in…
    
    Reply
20 Glenn Fawcett March 12, 2009 at 3:20 am

As was stated earlier, these machines are all about throughput. Response time will be longer on compute intensive jobs vs traditional architectures. It comes down to your SLAs.

If your transaction response time increases from 10ms to 30ms, who cares if the SLA is for less than 1 second response time? If your batch time increases from 1 hour to 3 hours, tuning might be in order.

I have talked quite a bit about this in my blog and co-presented with Andrew Holdsworth at OOW on the topic. Maybe some of this material might be useful.

take care,
Glenn

Reply
21 Ayse Abacioglu December 3, 2009 at 3:59 pm

Hi,

I have a question about Orcale Memory usage?

Oracle memory usage in T5220 is more than 2.5x that of N240. But other processes only increased little.

SGA is the same in both system.

N240 369M T5220 1038M

Thanks,

Ayse

Reply
22 Admin February 24, 2010 at 5:36 pm

I just want to say thanks for taking the time to post this, I found this very interesting indeed.

Reply
23 oracle dba May 17, 2010 at 12:38 pm

My experince with T2000 is not very positive.

since our developers found that mysql innodb is not running at at usable on the T2000 (single threaded), they changed to oracle.

After 2 Months of tuning and testing with parallel and partitioning we got the application working.

The speed is much worse than on a old linux develpment system even everything runs massivly parallel.
form time to time the cpu load queues up and the machine is hanging for 30 min.

even io is pretty fast there are not many oracle application fitting to T2000 in my opinion.

Reply
- 24 Sandeep Raja Rao October 13, 2010 at 8:53 am
  
  Might be a late reply, but then What version of mysql did you use?
  
  5.1.??, Did you use the built in version of InnoDB or the plug in?
  
  What optimizations did you try?
  
  innodb_thread_concurrency, innodb_buffer_pool_size, etc.,
  
  Reply

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage