Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?

We are all at different skill and technical sophistication levels so this post will look like fluff to some readers. This is just a quick post to show a clear depiction of a SAN configuration that wouldn’t do anyone any favors in Oracle environment.

I showed some snapshots of the lab gear at my disposal in this blog entry. There are a lot of SANs in the lab here, but I can’t say which brand of SAN array I’m blogging about today, and honestly, this problem is not necessarily the architecture of the particular SAN array controller in question. Yes, this is a low-end array, but the concepts I’m blogging about are relevant through the high-end. The following vmstat(1) output shows one of the problems I’m blogging about:

# vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 3  3      0 31577944  81948 653352    0    0    96   597   13    34  0  7 89  4
 5  4      0 31550968  82204 653096    0    0 73732 89913 2919 57992  0 10 60 30
 0  4      0 31576760  82332 652968    0    0 74555 98056 2942 55904  0 10 60 30
 1  6      0 31553824  82460 652840    0    0 80289 85870 2831 61435  0 10 63 28
 0  4      0 31576968  82460 652324    0    0 79466 85397 2860 60537  0 10 62 28
 8  0      0 31573368  82588 654260    0    0 41279 46733 1917 36035  0 32 53 15
 8  0      0 31572896  82588 653744    0    0 11613 13344 1239 13387  0 49 46  5
 5  0      0 31572528  82588 654260    0    0  8875  9039 1193 10727  0 47 49  4
 7  0      0 31572280  82588 653744    0    0  6559  7057 1146  9823  0 47 49  4
 6  0      0 31572176  82588 653744    0    0  6556  7048 1156  9121  0 50 47  3
 4  0      0 31572072  82588 654260    0    0  5736  5956 1136  7955  0 47 50  3
 6  0      0 31571576  82588 654260    0    0  4205  5529 1126  7851  0 47 51  2
 8  0      0 31571584  82588 654260    0    0  5960  5264 1132  7982  0 48 49  3
 7  0      0 31570896  82652 654712    0    0  4239  5167 1119  8086  0 47 51  2
11  0      0 31569968  82652 654712    0    0  5116  4995 1124  7437  0 55 43  2
 7  0      0 31570184  82684 653648    0    0  4782  5082 1125  7880  0 54 44  2

This vmstat(2) output shows that Oracle was getting upwards of 165MB/s (via a single 2Gb FCP path) of combined read/write throughput until the array cache reached its saturation point. At that point, the throughput of the array was relegated to the bandwidth of the spindles being accessed.

Its All About Balance
I have a pretty simple view of storage and it is worth quoting myself:

Physical I/O is a necessary evil. Bottlenecking downwind of the SAN array cache is silly. Bottlenecking above the disks is foolishness.

Bottlenecking The Array (Silliness)
Configuring LUNs with insufficient spindle count to handle the cache miss (read-through) and write-back overhead is what I call bottlenecking the SAN array. This sort of bottleneck is strictly a configuration issue and therefore falls into the silliness category. The vmstat(1) output above is an example of bottlenecking the array. That is, the array could certainly deliver more throughput, but the spindle count was holding it back. Again, I’m not going to talk about what vendor’s SAN array controller it was because as it turns out any of them will act this way under these circumstances. I had a LUN of some 500GB that consisted of very few spindles so it was no surprise to me that the throughput was lousy. This LUN had to be configured this way due to other constraints on the array’s usage (read Kevin had to share some hardware). I got to live the pain I blogged about recently in this post about capacity versus spindle count. I could configure a more reasonable number of drives and get around this performance problem, but only up to the point where the tables turn and the array starts to bottleneck the disks.

Bottlenecking The Disks (Foolishness)
What do I mean by this? Well, most modern SAN arrays will bottleneck long before your application realizes the full bandwidth potential of the all the drives the array can support. I call this bottlenecking the disks and it falls into the foolishness category. This is basic foolishness on the SAN array vendor’s part. Why build storage arrays with horrible imbalance between capacity and performance?

Let me go over a typical case of a SAN array that bottlenecks the disks. Here is a link to some technical detail of a high end array. This particular array supports roughly 140TB of capacity when configured with the maximum spindle count (1024) of 146GB drives. On page 3 of the document, the vendor states the first figure related to bandwidth by citing the internal bandwidth of 15GB/s. What does that mean to the servers connected to this array? Reading further (pg. 15) the document states that there is some 10.6GB/s of aggregate data bandwidth and 5.3GB/s bandwidth for control. This means that if you get all the kings horses and all the kings men to work it out, you could feed 10.6GB/s of data to your servers. Indeed, 10.6GB/s is a great deal of bandwidth. So what am I talking about? Taking another look at page 15 we see the vendor’s claims that a single 146GB drive can deliver 99.9MB/s ( max theoretical). If you wanted to drive all these spindles at full throttle, they would theoretically deliver 99.9GB/s (1024 disks X 99.9MB/s) which is much less than the maximum theoretical data bandwidth of the array. In fact, if you drove “only” about 108 of these spindles at full throttle you’d saturate the array. I quoted the word only because 108 is a lot of disks, but that is only 10% of what the array supports.

If Oracle needs to do a physical I/O, let’s not bottleneck somewhere in the pipe! Think about it this way, in the technology stack I’m discussing (Oracle, high-end SAN array, etc), hard drive technology represents the simplest and least advanced component. That is, while hard drives are faster than they were 10 years ago, they have not fundamentally changed. They are still round and brown and they spin. Wouldn’t you think homely old disk drives would be the bane of performance? They aren’t. If we could drive all our disks at their maximum throughput, we’d be in a much better place performance-wise.

Chewing On Crushed Glass
This technical write-up about data warehousing experiences on a high-end SAN array is a perfect example of what must certainly be about as enjoyable as chewing on crushed glass. Ignore the many improper notation errors (e.g., using ‘b’ where ‘B’ is really intended). The write-up details the pain so I think it is important.

Summary
Hard drives are miserably low-tech necessary evils. Will we ever get a storage architecture where the more sophisticated components don’t make matters worse? Yes, we will. I’ll tell you all more as time passes. In the meantime, I bet dollars to donuts that the paltry 128 drives used in the TPC-H benchmark I blogged about were being driven at or near full bandwidth. That, my dear readers, is cool.

5 Responses to “Hard Drives Are Arcane Technology. So Why Can’t I Realize Their Full Bandwidth Potential?”


  1. 1 Mark J. Bobak September 7, 2007 at 5:15 pm

    Hi Kevin,

    Minor nitpick:
    In the paragraph titled “Bottlenecking the Array”, the third sentence is “I call this bottlenecking the disks and it falls into the foolishness category.”

    Shouldn’t that read “…bottlenecking the array…”?

    -Mark

  2. 2 billy bathgates March 11, 2008 at 6:28 pm

    I think the idea is that if you are doing any amount of random I/O, the per disk throughput drops enormously, and you will not bottleneck the disks in a typical configuration.

    Even if it’s sequential, you will be driving many many servers at the data rates you are talking about. Maybe there is an element of randomness introduced by ‘context switching’ on the disks with that many streams, if that makes any sense?

  1. 1 What Oracle Topic Is On Your Coffee Table? « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on September 6, 2007 at 12:38 am
  2. 2 Proof-Positive: Memory is Faster Than Disk. Don’t Need No Book Learnin’ to Cipher That One. « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on February 21, 2008 at 10:53 pm
  3. 3 Comment on Hard Drives Are Arcane Technology. So Why Can’t I … Trackback on February 22, 2008 at 1:24 am