In Part I of this series I discussed the concept of “Mindless I/O” and contrasted Orion I/O testing to SLOB.
Some time has passed since that post and a lot of folks have been studying their systems using SLOB. Others have been motivated to prove that the differences between SLOB and Orion are moot. I’m OK with that because I use both of them and a large array of other I/O tools (fio, bonnie, etc,etc).
Test Single Block Random Reads With SLOB And Call It A Day
There is something I’d like to quickly point out about SLOB.
SLOB has 4 models for testing. Since it is a real SGA it can be used for testing the platform’s ability to handle SGA-cached workloads. Think about it. If your platform can’t scale up cached reads you’ve got a problem. This is what a lot of us SLOB users refer to as Logical I/O testing where the metric is LIOPS (logical I/O per second). Another model is, of course, the simple random read model. There are also two models that stress genuine DBWR and LGWR functionality and those models cannot be mimicked accurately, or otherwise, with Orion. Orion is mindless I/O. That fact matters to me and perhaps will matter to some of you as well.
Click here for more on the different SLOB models.
Cool Idea: Let’s use Asynchronous I/O Libraries To Determine Platform Fitness For High-Rate Synchronous I/O
Another thing to be aware of with Orion (as I pointed out 6 years ago) is the fact that Orion actually uses the wrong library for performing random single block reads (db file sequential read). Yes, it uses VOS routines (skgf*), but the wrong ones–at least for the random single block read tests. The following screen shot shows Orion attempting to measure platform suitability for Oracle single block random reads (the db file sequential read model). Oracle instances do not use io_submit()/io_getevents() for db file sequential read:
A real Oracle process (such as a SLOB session) uses synchronous I/O (specifically libC pread()) to service db file sequential read. Yes, I know this varies by file storage type, dNFS, Exadata and other variations but I’m on blisteringly fast storage accessed via direct I/O through an XFS file system–a perfectly supported storage provisioning approach. Nonetheless, in all cases Oracle instances perform single-block I/O when performing single-block blocking I/O–not some FrankenSynchronousIO(tm) such as a single-block request through the asynchronous API followed by a group-reaping of outstanding requests as Orion is doing in the above screenshot. The screen shot shows a single Linux process (Orion) reaping 128 previously-submitted asynchronous I/Os whilst aiming to simulate db file sequential read.
So, if you happen to be testing modern, non-mechanical storage at a rate of, say, 20,000 IOPS/core you are missing out on the joy of analyzing the associated cost of 20,000 context switches/sec. Most people think a system call is a context switch–it isn’t. A context switch is the stopping of a process, saving its state, executing the scheduler code and switching to the context of another process. When a scheduler gets to make 20,000 affinity (NUMA, L2 cache, etc) decisions per second you get the opportunity of learning whether you like the choices it is making. I prefer to know as opposed to blind faith in the OS. A platform is more than the sum of all it’s components. It needs to all come together.
Another problem with using libaio in place of synchronous pread() is the fact that I/Os submitted in the asynchronous path are flat-out handled differently–and different is not the same as same. This is the case on all operating systems I know of. When requests are submitted through the asynchronous interfaces there are opportunities for the OS to potentially optimize the strategy for servicing the I/O. After all, the I/Os have been submitted from a single process context whereas the same number of I/O requests funneling through the synchronous interface means associating the request with a potentially vast numbers of discrete processes. It’s just different, not “slightly less-same”, but different. And different matters–at least to me.
It’s Just Different–And That’s All That Matters
There are some folks in the blogosphere putting in good SLOB testing. Some good folks at Pythian are doing the heavy lifting of proving to you that SLOB and Orion are on par or even, perhaps, Orion is somehow better for testing random reads. I should point out that random reads testing is just a miniscule portion of platform performance analysis. The fact that SLOB and Orion are different is all that matters to me. I need more tools than just a hammer. And when I need a hammer I don’t what a screwdriver that feels serendipitously hammer-like.
Once again I want to thank Yury for testing SLOB. Even if I don’t agree with his conclusions I very much appreciate his work with the kit. Healthy disagreements are healthy. We cultivate these threads and we all learn (I presume).
Different In So Many Ways
So if you are reading this, and are about to begin SLOB testing please be mindful of the following “feature overview” vis a vis SLOB:
- SLOB is an Oracle instance. It doesn’t toil along trying to behave like an instance nor challenge your faith in accepting the differences as unimportant.
- SLOB is not mindless I/O. You get the chance to learn more about your platform when the CPUs are actually doing something. I/O is not no longer a probelm with modern platforms. It’s all about CPU–always has been but obscured by artificial storage plumbing problems.
- SLOB uses the right routines. If you want to study the fitness for a given platform to handle high-rate db file sequential read you might as well do db file sequential read–or at least use a tool that calls the proper system calls. Libaio and LibC share letters of the alphabet but they are really different–and different in this specific regard matters. Actaully, different always matter to me.
- SLOB will allow you to test / study DBWR and LGWR. If you can get Orion to aid you in the study of log file sync or DBWR flushing capacity please drop me a note because I missed the memo :-) .
About Part III
I discussed “Mindless I/O” in Part I. I needed to inject this installment to point out the little bits about Orion using libaio when you want to test synchronous reads. When I get to Part III I’ll get back to the Mindless I/O topic.