Pardon Me, Where Is That Flash Cache? Part II.

In Part I of this series about Database Smart Flash Cache I laid a bit of groundwork on the topic of Database Flash Cache and how it differs from Exadata Smart Flash Cache. I’d like to continue this series by discussing some of the mechanisms associated with Database Smart Flash Cache. But first, I’d like to offer a little more background in case the architecture of Database Smart Flash Cache is still confusing.

Fast Forward To The Past
During the mid-to-late 1990s, and earlier part of this decade,  DBAs would routinely find themselves managing 32-bit Oracle databases on systems with very large physical memory and a 64-bit Operating System. These systems would support a system buffer cache limited only by physical memory. In order to help reduce physical I/O administrators would sometimes use buffered file system files for Oracle. Oracle I/Os were therefore DMAed in the system buffer cache and then copied in kernel mode into the address space of the calling Oracle process. Re-reads from the system buffer cache eliminated physical I/Os. This model came with significant penalties such as the fact that all data in the SGA buffer cache was also present in the system buffer cache (waste of memory). The memory copy operations from the system buffer cache into the SGA buffers (for every physical I/O) was quite a load on systems of the time period as well.

While working in Sequent Database Engineering in the 1990s I was one of the folks pushing for a model quite similar to how Database Flash Cache works to address Oracle caching needs.  Instead of files on Flash, large “buffer cache files” were to reside in main memory. The proposal was such that when a buffer aged from the SGA buffer pool it would be written  to the “buffer cache file.” Disk reads would occur directly into the SGA. If data happened to be in the “buffer cache file” it would copied from RAM (with a mapping operation) thus eliminating a physical I/O. That would have been much better than the double-buffering effect of using buffered file system files. As it turns out, it would have also been quite similar to Database Smart Flash Cache architecture.

None of that ever came to pass, however, because we devised and implemented Indirect Data Buffers as a much better approach than leveraging system buffer cache or any other such external caching. However, that didn’t keep me from pursuing NUMA-optimized disk caching technology as my NUMA-optimized external cache patent sort of suggests.

What does this have to do with Database Smart Flash Cache?

Is that a Logical Read or a Physical Read? Yes!
The real goal of Database Smart Flash Cache is to reduce physical disk reads. It can’t do anything for writes. Most OLTP/ERP applications do more reading than writing so I think this is OK. The interesting turn on this is that physical disk reads from Database Smart Flash Cache occur in the logical I/O code path.

A logical I/O that hits a buffer previously aged to the Flash Cache requires a physical read to bring the buffer back into the address space of the process. Say what? This technology puts a physical read in the logical read code path? Yes.

Think of it this way. If a process doesn’t find the block it needs in the buffer cache, in a non-Database Smart Flash Cache scenario, it has to perform a read from disk. In the case of Database Smart Flash Cache there is a miss in the first level SGA buffer pool (aka L1 SGA) followed by a hit in the second level SGA buffer pool and at that point there is a physical disk read from Flash. The read from Flash will be quite fast and in the case of a PCI Flash card there won’t even be interaction with a central disk controller since each PCI Flash card has its own built-in disk controller.

In the next installment in this series of blog posts I will cover a performance comparison of Database Flash Cache. However, I need to end this installment with a quick look into how the instance accesses the Database Smart Flash Cache file.

It’s Cache, But It’s A File
In the following box I’ll show the initialization parameters and DBWR open files for an instance that is using a Database Smart Flash Cache file.

NOTE: If you hover over the upper right hand side of the box you’ll see a widget that will allow you to view the box in wide screen

SQL> show parameter db_flash

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_flash_cache_file                  string      /home/cache/scratchfile
db_flash_cache_size                  big integer 31G

# ps -ef | grep dbw
oracle    6466     1  0 Dec07 ?        00:00:35 ora_dbw0_OLTP1
oracle    6468     1  0 Dec07 ?        00:00:39 ora_dbw1_OLTP1
oracle   10428     1  0 Oct20 ?        00:02:59 asm_dbw0_+ASM1
oracle   11733 26379  0 08:17 pts/12   00:00:00 grep dbw
# ls -l /proc/6466/fd | grep file
lrwx------ 1 oracle oinstall 64 Dec  7 19:47 22 -> /home/cache/scratchfile
lrwx------ 1 oracle oinstall 64 Dec  7 19:47 33 -> /home/cache/scratchfile

OK, so that showed us the the DBWR process for the OLTP1 instance has two file descriptors open on the Database Smart Flash Cache file. After starting an OLTP workload I straced DBWR to see how it was interacting with the Flash Cache file:


strace -p 6466 2>&1 | head -10
Process 6466 attached - interrupt to quit
pwrite(33, "\6\242\0\0\345]\0\0HP\316\252\0\0\2\4\371\376\0\0\1\0\0\0\345E\1\0\226L\316\252"..., 8192, 29339877376) = 8192
pwrite(33, "\6\242\0\0004o\0\0\247l\316\252\0\0\2\4\2404\0\0\2\0\0\0\351E\1\0005l\316\252"..., 8192, 29339811840) = 8192
pwrite(33, "\6\242\0\0\344\27\0\0_M\316\252\0\0\2\4t\262\0\0\1\0\0\0\345E\1\0\226L\316\252"..., 8192, 29339795456) = 8192
pwrite(33, "\6\242\0\0\2202\3\0\221R\316\252\0\0\2\4\330\244\0\0\1\0\0\0\340E\1\0_L\316\252"..., 8192, 29338681344) = 8192
pwrite(33, "\6\242\0\0\310S\0\0\17O\316\252\0\0\2\4\262z\0\0\1\0\0\0\342E\1\0\345K\316\252"..., 8192, 29338583040) = 8192
pwrite(33, "\6\242\0\0[\236\7\0GU\316\252\0\0\2\4.\301\0\0\1\0\0\0\340E\1\0_L\316\252"..., 8192, 29338492928) = 8192
pwrite(33, "\6\242\0\0\371\277\2\0\364]\316\252\0\0\2\4\371\315\0\0\1\0\0\0\342E\1\0\345K\316\252"..., 8192, 29336969216) = 8192
pwrite(33, "\6\242\0\0\271\353\0\0\332e\316\252\0\0\2\4\220\262\0\0\2\0\0\0\350E\1\0\247e\316\252"..., 8192, 29336190976) = 8192
pwrite(33, "\6\242\0\0003F\v\0\227[\316\252\0\0\2\4~*\0\0\1\0\0\0\340E\1\0_L\316\252"..., 8192, 31513272320) = 8192

Well, that’s not good! DBWR is performing synchronous writes to the Flash Cache file. No mystery, I forgot to set filesystemio_options=setall. After doing so, DBWR uses libaio calls to spill into the Flash Cache file (the 5th word in each iocb is the file descriptor). The following snippet shows a blast of 165 writes to the Flash Cache file:


io_submit(47959748861952, 165, {{0x2b9e7fb02630, 0, 1, 0, 33}, {0x2b9e7faa3c70, 0, 1, 0, 33}, {0x2b9e7fa545f8, 0, 1, 0, 33}, {0x2b9e7fb91a70, 0, 1, 0, 33},

{0x2b9e7fa276f0, 0, 1, 0, 33}, {0x2b9e7faf68f8, 0, 1, 0, 33}, {0x2b9e7fafcfb0, 0, 1, 0, 33}, {0x2b9e7fc09f10, 0, 1, 0, 33}, {0x2b9e7fb8b920, 0, 1, 0, 33},

{0x2b9e7fa91880, 0, 1, 0, 33}, {0x2b9e7fb05170, 0, 1, 0, 33}, {0x2b9e7fb80c20, 0, 1, 0, 33}, {0x2b9e7faa6d18, 0, 1, 0, 33}, {0x2b9e7faba140, 0, 1, 0, 33},

[ .. many lines deleted ...]

{0x2b9e7fbb6d20, 0, 1, 0, 33}, {0x2b9e7faa1c00, 0, 1, 0, 33}, {0x2b9e7faeabc0, 0, 1, 0, 33}, {0x2b9e7fa7f490, 0, 1, 0, 33}, {0x2b9e7fb8a380, 0, 1, 0, 33},

{0x2b9e7fad5c90, 0, 1, 0, 33}}) = 165

So we’ve seen the mechanisms used by DBWR to spill into the Flash Cache file. What about foreground processes? The following shows a shadow process performing 8KB synchronous reads from the Flash Cache file. These pread() calls are physical I/O and all, are actually in the extended code path of  logical I/O. We are talking about cache hits after all and the is an SGA LIO.

$ ls -l /proc/32719/fd | grep scratch
lrwx------ 1 oracle oinstall 64 Dec 10 09:19 14 -> /home/cache/scratchfile
[oracle@dscgif05 dbs]$ strace -p 32719 2>&1 | grep pread | head -20
pread(14, "\6\242\0\0\256\215\1\0\337e\316\252\0\0\2\4\304\34\0\0\2\0\0\0\350E\1\0\316e\316\252"..., 8192, 23248191488) = 8192
pread(14, "\6\242\0\0\204\31\6\0\366\367\\\253\0\0\1\4z\321\0\0\1\0\0\0\340E\1\0\366\367\\\253"..., 8192, 17962737664) = 8192
pread(14, "\6\242\0\0\202:\1\0\366S\316\252\0\0\2\4\25\301\0\0\1\0\0\0\342E\1\0\345K\316\252"..., 8192, 32084803584) = 8192
pread(14, "\6\242\0\0\204V\5\0\346\272\\\253\0\0\1\4\207\247\0\0\1\0\0\0\340E\1\0\346\272\\\253"..., 8192, 27747041280) = 8192
pread(14, "\6\242\0\0\337\305\0\0\337e\316\252\0\0\2\4ey\0\0\2\0\0\0\350E\1\0ne\316\252"..., 8192, 30495244288) = 8192
pread(14, "\6\242\0\0= \5\0\2035h\253\0\0\1\4)\223\0\0\1\0\0\0\340E\1\0\2035h\253"..., 8192, 20019929088) = 8192
pread(14, "\6\242\0\0\223\v\1\0,T\337\252\0\0\1\6\316\207\0\0\1\0\0\0\342E\1\0+T\337\252"..., 8192, 28805251072) = 8192
pread(14, "\6\242\0\0\214\256\0\0\332e\316\252\0\0\2\4\5a\0\0\2\0\0\0\350E\1\0ne\316\252"..., 8192, 31322808320) = 8192
pread(14, "\6\242\0\0\241w\4\0#\232m\253\0\0\1\4\334A\0\0\1\0\0\0\340E\1\0#\232m\253"..., 8192, 31416205312) = 8192
pread(14, "\6\242\0\0\304<\1\0\374S\316\252\0\0\2\4>\257\0\0\1\0\0\0\342E\1\0\345K\316\252"..., 8192, 24838955008) = 8192
pread(14, "\6\242\0\0`\201\1\0\332e\316\252\0\0\2\4=*\0\0\2\0\0\0\350E\1\0\316e\316\252"..., 8192, 21697290240) = 8192
pread(14, "\6\242\0\0z\300\5\0\341Tl\253\0\0\1\4\365\315\0\0\1\0\0\0\340E\1\0\341Tl\253"..., 8192, 21481119744) = 8192
pread(14, "\6\242\0\0O\243\1\0\365e\316\252\0\0\2\4\261!\0\0\2\0\0\0\350E\1\0Ue\316\252"..., 8192, 22630391808) = 8192
pread(14, "\6\242\0\0\221Z\r\0\240B]\253\0\0\1\4zK\0\0\1\0\0\0\340E\1\0\240B]\253"..., 8192, 32498204672) = 8192
pread(14, "\6\242\0\0\222Z\r\0\t\335g\253\0\0\1\4$\316\0\0\1\0\0\0\340E\1\0\t\335g\253"..., 8192, 25871138816) = 8192
pread(14, "\6\242\0\0o\245\1\0\365e\316\252\0\0\2\4s\272\0\0\2\0\0\0\350E\1\0Ue\316\252"..., 8192, 30909734912) = 8192
pread(14, "\6\242\0\0\373i\r\0\211\10b\253\0\0\1\4.q\0\0\1\0\0\0\30E\1\0\211\10b\253"..., 8192, 27650859008) = 8192
pread(14, "\6\242\0\0\374i\r\0kTc\253\0\0\1\4v0\0\0\1\0\0\0\340E\1\0kTc\253"..., 8192, 18883878912) = 8192
pread(14, "\6\242\0\0t\357\2\0\2678\344\252\0\0\1\6R3\0\0\1\0\0\0\342E\1\0\345K\316\252"..., 8192, 23501406208) = 8192
pread(14, "\6\242\0\0003\247\1\0\370e\316\252\0\0\2\4\314\35\0\0\2\0\0\0\350E\1\0Ue\316\252"..., 8192, 27493490688) = 8192
$ ps -ef | grep 32719 | grep -v grep
oracle   32719 32700  4 09:17 ?        00:00:05 oracleOLTP1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Summary
This post in the series aimed to show some of the mechanisms involved when processes interact with the Database Smart Flash Cache file. In Part III I’ll be sharing some performance numbers.

16 Responses to “Pardon Me, Where Is That Flash Cache? Part II.”


  1. 1 Timur Akhmadeev December 15, 2009 at 8:37 am

    Hi Kevin,

    thank you for this series. I have some questions about FC (or should we call it DBFC?) that bothers me and I hope you’ll cover them in the upcoming posts:

    1) how the flash cache file is logically organized and how flash cache data structures consistency is supported? This isn’t “just a file”?

    >A logical I/O that hits a buffer previously aged to the Flash Cache requires a physical read to bring the buffer back into the address space of the process.
    2) AFAIU this means “a physical read of the block to the buffer cache”. Am I correct? If so, then how flash cache differs from file system cache in terms of CPU usage?

    3) is an algorithm for aging out buffers from buffer cache to the flash cache the same as usual (i.e. age out cold dirty buffers)

    Thanks in advance.

  2. 2 Artem Gorbyk January 3, 2010 at 1:08 pm

    Hi, Kevin, thanks for the article.

    Another interesting question is how this new feature works in RAC environment (documentation says its supported). Is cache fusion aware of data blocks that resides in flash caches of multiple nodes (whis theis statuses) and can deliver them via interconnect to the buffer cache of the node that requests given block? Are there new global cache statistics that counts and shows such events?
    Or everything is much simpler and nodes can work only with their own flash caches?

  3. 3 jametong February 2, 2010 at 4:57 am

    When can you release part III of this series.

  4. 5 Surachart Opun February 24, 2010 at 3:36 pm

    Flash Cache Feature on 11gR2, Can I use it on Redhat?

    I’m testinf RHEL 5.3 + Oracle 11gR2 + Patch:8974084 + Storage F5100 Flash Array.

    I’d like to test Flash Cache Feature, But I still error…

    ORA-00439: feature not enabled: Server Flash Cache
    ORA-01078: failure in processing system parameters

    Any Idea?

    • 6 kevinclosson February 24, 2010 at 6:57 pm

      Hello Surachart,

      I believe the documentation specifies OEL but that technically hs nothing to do with the error you are seeing. See if a re-read of my posts on the matter don’t refer to an Oracle patch that enables Database Flash Cache. That is the issue.

      • 7 Surachart Opun February 24, 2010 at 7:15 pm

        Thank You for reply. I patched 8974084, but still error.

        Oracle Docs told to use Solaris or OEL (Only).

        I’m not sure, But I’ll attempt with RHEL before Install OEL.
        And re-read your posts.

        -)

  5. 8 JMIZE March 18, 2011 at 9:00 pm

    Hi Kevin and Surachart –

    Curious to see if anyone has since tried using smart flash on non-Oracle Linux / non-Solaris. I do not have access to RHEL/SuSE but am curious to see if Oracle software actually complains if db_flash_cache_file/_size are set and then if it actually uses it when necessary. The question came up when speaking with a few enterprise customers looking to migrate from RHEL –> Oracle.

  6. 10 Amin Adatia October 5, 2011 at 5:20 pm

    In the ExaData v2 we have the values for
    db_flash_cache_file is null
    db_flash_cache_size is zero

    Do these parameters have to be set after the fact?


  1. 1 Blogroll Report 11/12/2009-18/12/2009 « Coskan’s Approach to Oracle Trackback on January 2, 2010 at 5:50 pm
  2. 2 Flash cache, exadata et 11GR2 « Trackback on January 8, 2010 at 5:24 pm
  3. 3 Kerry Osborne’s Oracle Blog » Blog Archive Exadata - Take 2 - Flash Cache - Kerry Osborne’s Oracle Blog Trackback on May 17, 2010 at 3:02 pm
  4. 4 Pardon Me, Where Is That Flash Cache? Part I. | Kevin Closson's Blog: Platforms, Databases and Storage Trackback on March 11, 2013 at 11:48 am
  5. 5 Flash cache, exadata et 11GR2 | EASYTEAM Trackback on May 29, 2013 at 8:19 am

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.




DISCLAIMER

I work for Amazon Web Services. The opinions I share in this blog are my own. I'm *not* communicating as a spokesperson for Amazon. In other words, I work at Amazon, but this is my own opinion.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 743 other subscribers
Oracle ACE Program Status

Click It

website metrics

Fond Memories

Copyright

All content is © Kevin Closson and "Kevin Closson's Blog: Platforms, Databases, and Storage", 2006-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Kevin Closson and Kevin Closson's Blog: Platforms, Databases, and Storage with appropriate and specific direction to the original content.