DBWR Multiblock Writes? Yes, Indeed!

 

Learning Something New
Learning Oracle is a never ending effort. “OK, tell me something I didn’t know”, you say? You may know this bit I’m about to blog about, but I sure didn’t. I don’t know when the Database Writer I/O profile changed, but it has—somewhere along the way.

I have a simple test of 10gR2 (10.2.0.1) on RHEL4 x86_64 using filesystem files. I was using strace(1) on the single DBWR process I have configured for this particular instance. The database uses an 8KB block size and there are no variable block sizes anywhere (pools are not even configured). The workload I’m running while monitoring DBWR is quite simple. I have a tablespace called TEST created using Oracle Managed Files (OMF)—thus the peculiar filenames you’ll see in the screen shots below. I have 2 tables in the TEST tablespace and am simply looping INSERT INTO SELECT * FROM statements between the 2 tables as the stimulus to get DBWR busy.

In the following screen shot you’ll see that I took a look at the file descriptors DBWR is using by listing a few of them out in /proc/<DBWR PID>/fd. The interesting file descriptors for this topic are:

  • FD 18 – System Tablespace
  • FD 19 – UNDO Tablespace
  • FD 20 – SYSAUX Tablespace
  • FD 23 – The TEST Tablespace

NOTE: Please right click the image to open in a viewer. Some of you readers have reported a problem, but we’ve found that it is as simple as clicking it. I need to investigate  if that is something that wordpress is doing.

p1

I have purposefully set up this test to not use libaio, thus filesystemio_options was not set in the parameter file. In the next screen shot I use grep to pull all occurrences of the pwrite(1) system calls that DBWR is making that are not 8KB in size. Historically there should be none since DBWR’s job is to clean scattered SGA buffers by writing single blocks to random file offsets. That has always been DBWR’s lot in life.

p3

So, as strace(1) is showing, these days DBWR is exhibiting a variation of its traditional I/O profile. In this synchronous I/O case, on this port of Oracle, DBWR is performing synchronous multi-block writes to sequential blocks on disk! That may seem like trivial pursuit, but it really isn’t. First, where are the buffers? The pwrite system call does not flush scattered buffers as do such routines as writev(),lio_listio() or odm_io()—it is not a gathered write. So if DBWR’s job is to build write batches by walking LRUs and setting up write-lists by LRU age, how is it magically finding SGA buffers that are adjacent in memory and bound for sequential offsets in the same file? Where is the Twilight Zone soundtrack when you need it? The only way DBWR can issue these system calls requires the buffers to be contiguous in memory. That would surely require a memory copy from the SGA buffers to a contiguous memory location of equal size in either the PGA or some non-buffer pool heap in the SGA.

Of course DBWR also behaved in a more “predictable” manner during this test as well as the following screen shot shows:

p2

Is This A Big Problem?
No, I don’t think so—unless you’re like me and have had DBWR’s I/O profile cast in stone dating back to Version 6 of Oracle. All this means is that when you are counting DBWR write calls, you can’t presume they are always single block. Now that is something new.

Is This A Direct Path Write?
I don’t yet know if this is accounted for as a direct path write in Oracle statistics. I think it should be, because it look, smells and feels just like a direct path write, but I need to look into it.

Porting
I often point out that Microsoft has always had it quite easy with SQL Server. I’m not doing that pitiful Microsoft bashing that I despise so much, but think about it. SQL Server started with a fully functional version of Sybase (I was a Sybase fan) and brought it to market on 1 Operating System and only 1 platform (x86). Since then their “porting” effort as “exploded” into x86, x86_64 and IA64. That is the simple life.

This DBWR “issue” may only be relevant to this Oracle revision and this specific port. I can say with great certainty that if Sequent were still alive, the Sequent port would have used one of the gathered write calls we had at our disposal in this particular case. With Oracle, so much of the runtime is determined by porting decisions. So the $64,000 dollar question is, why isn’t DBWR just using writev(1) which would nicely clear all those SGA buffers from their memory-scattered locations?

Things That Make You Go Hmmmmm

 

 

13 Responses to “DBWR Multiblock Writes? Yes, Indeed!”


  1. 1 David Aldridge December 1, 2006 at 9:57 pm

    Hmmm, very interesting. It does seem like direct path write, but without the APPEND hint or parallel DML I don’t see how that could occur.

    I couldn’t open the attached images though, Kevin. Some link problem I think.

  2. 2 kevinclosson December 1, 2006 at 10:03 pm

    Hi David,

    I’m not happy with the fact that I have to attach screen shots, but on the other hand I have a terrible time cutting and pasting from an xterm into wordpress’ editor…it doesn’t look anything like what I was seeing in the xterm…just a mess… have you tried right clicking? I’ve had other folks here look at this blog post with firefox Lin and Win as well as IE6 and they have no problems…but then I also had another person report a problem…aargh!

    The reason the screen shot is SO critical is it shows that I was stracing DBWR so messing with parallel DML wont shange his activity. If this was a PQ Parallel DML, stracing DBWR would be very boring. It is really wierd. This is DBWR coalescing and doing a sychronous write, but not a vectored write (ala writev) …just wierd

  3. 3 Doug Burns December 1, 2006 at 10:24 pm

    Yeah, I thought I couldn’t view the images at first because I just tried clicking them but a right-click/view image does the trick.

    Interesting.

  4. 4 Jared Still December 1, 2006 at 10:34 pm

    I concur with Doug Burns on this – right click
    and ‘view image’ does the trick for viewing the
    screen shots.

  5. 5 kevinclosson December 1, 2006 at 10:36 pm

    thanks guys … I will be posting a LOT to this blog with vt100/xterm stuff … I’ll be working out how to cut and paste, but in the meantime it is simpler for me to go this route…the important thing is that I find something quickly that works for everyone that reads my blog

  6. 6 David Aldridge December 1, 2006 at 10:45 pm

    Yup works .. cheers fellas. Put me down as “baffled by technology”.

  7. 7 Alexander Fatkulin December 2, 2006 at 3:50 am

    Not about DBWR but about new things.

    About few months ago i discovered that on 10.2 index range scans and index full scans can use multiblock reads of eight blocks (then possible) at a time.

    Rows Row Source Operation
    ——- —————————————————
    200 TABLE ACCESS BY INDEX ROWID TEST (cr=206 pr=69 pw=0 time=654773 us)
    200 INDEX FULL SCAN I_TEST (cr=4 pr=3 pw=0 time=25281 us)(object id 3996531)

    Elapsed times include waiting on following events:
    Event waited on Times Max. Wait Total Waited
    —————————————- Waited ———- ————
    SQL*Net message to client 3 0.00 0.00
    SQL*Net message from client 3 0.00 0.00
    db file sequential read 5 0.01 0.04
    db file scattered read 8 0.01 0.07

  8. 8 Alex Gorbachev December 2, 2006 at 11:00 pm

    Well, direct path writes are going directly from shadow processes’ PGA to disk. Obviously, DBWR role is to flush dirty buffers from SGA buffer cache so it just can’t be direct path writes.

    By the way, when I I traced DBWR with 10046 with async IO enabled – it didn’t post any event to the trace file and I was scratching my head how is that DBWR was idle during checkpoints… until I attached to it with strace. It was on 10.2 (I think 10.2.0.1) on x86 32bit platform.

  9. 9 kevinclosson December 3, 2006 at 4:37 am

    Yes, Alex, you are correct in your definition of direct path writes. I have not yet looked at the accounting of these DBWR I/Os, so there is still some guessing going on…

    Your assertion that DBWR’s role is to flush from the SGA to disk is no longer entirely correct. That is why I posted this whole thing. DBWR is now flushing with the traditional I/O profile of single block SGA buffers to disk and this coalesced (copied) set of buffers when it deems fit. I’ll be posting more info as soon as possible.

  10. 10 Frits Hoogland December 3, 2006 at 6:13 pm

    Because of your blog about this, I looked at it and noticed an alteration in the statistics regarding reads&writes in v$sysstat.
    If you search for ‘physical write total%’, you’ll notice ‘physical write total multi block requests’. I have no evidence this truly reflects dbwr making multiblock writes, but the name suggests it.

  11. 11 David Aldridge January 4, 2007 at 6:32 pm

    I noticed while browsing the 10.2 documentation the following comment:

    http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14220/process.htm#i7259

    “In all cases, DBWn performs batched (multiblock) writes to improve efficiency. The number of blocks written in a multiblock write varies by operating system.”

  12. 12 kevinclosson January 4, 2007 at 7:22 pm

    Thanks, David… good find. The wording of that document, however, cannot reflect reality. It says, “In all cases”, which we all know to be preposterous. It is actually not the norm to have 2 cold adjacent buffered blocks to be written on the aged end of an LRU every time DBWR is posted to flush. In fact, I hope the server doesn’t spend much effort in determining the co-locality of buffers. I’d love to be able to benchmark with and without this feature. In fact, if I have 50 dollars to spare, I’d bet TPC-Cs are run with this feature compiled out because we all know that TPC-C is a random thrash of customer and stock.

    Thanks for stopping by, David.


  1. 1 Analyzing Asynchronous I/O Support with Oracle10g « Kevin Closson’s Oracle Blog: Platform, Storage & Clustering Topics Related to Oracle Databases Trackback on May 27, 2010 at 3:32 am

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




EMC Employee Disclaimer

The opinions and interests expressed on EMC employee blogs are the employees' own and do not necessarily represent EMC's positions, strategies or views. EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the EMC logo and content regarding EMC products and services, employee blogs are independent of EMC and EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.

This disclaimer was put into place on March 23, 2011.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,719 other followers

Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2013. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.

Follow

Get every new post delivered to your Inbox.

Join 1,719 other followers

%d bloggers like this: