Staging Data For ETL/ELT? Flat Files Appear Magically! No, Load Time Starts With Transfer Time.

In my recent post entitled Something to Ponder? What Sort of Powerful Offering Could a Filesystem in Userspace Be?, I threw out what may have seemed to be a totally hypothetical Teaser Post™. However, as our HP Oracle Exadata Storage Server and HP Oracle Database Machine customers know based on their latest software upgrade, a FUSE-based Oracle-backed file system is a reality. It is called Oracle Database File System (DBFS). DBFS is one of the corner stones of data loading infrastructure in the HP Oracle Database Machine environment. For the time being it is a staging area for flat files to be accessed as external tables. The “back-end”, as it were, is not totally new. See, DBFS is built upon Oracle SecureFiles. FUSE is the presentation layer that makes for mountable file systems. Mixing FUSE with DBFS results in a distributed, coherent file system scalable due to Real Application Clusters. This is a file system that is completely managed by Database Administrators.

So, I’m sure some folks’ eyes are rolling back in their head wondering why we need YAFS (Yet Another File System). Well, as time progresses I think Oracle enthusiasts will come to see just how feature rich something like DBFS really is.

If it performs, is feature rich and incremental to existing technology, it sounds awfully good to me!

I’ll be discussing DBFS in Oracle Exadata Technical Deep Dive – Part IV session tomorrow.

Here is a quick snippet of DBFS in action. In the following box you’ll see a DBFS mount of type FUSE on /data and the listing of a file called all_card_trans.ul


$ mount | grep fuse
dbfs on /data type fuse (rw,nosuid,nodev,max_read=1048576,default_permissions,allow_other,user=oracle)
$ pwd
/data/FS1/stage1
$ ls -l all_card_trans.ul
-rw-r--r-- 1 oracle dba 34034910300 Jun 15 15:30 all_card_trans.ul

In the next box you’ll see ssh jumping to 4 of the servers in an HP Oracle Database Machine to list the contents of the DBFS file system and md5sum output to validate that it is the same file.


$ for n in r1 r2 r3 r4
> do
> ssh $n md5sum `pwd`/all_card_trans.ul &
> done
[5] 3943
[6] 3945
[7] 3946
[8] 3947
$ 1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
1adbff1a36a42253c453c22dd031b48b  /data/FS1/stage1/all_card_trans.ul
[5]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul
[6]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul
[7]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul
[8]   Done                    ssh $n md5sum `pwd`/all_card_trans.ul

In the next box you’ll see concurrent multi-node throughput. I’ll use one dd process on each of 4 servers in the HP Oracle Database Machine each sequentially reading the contents of the same DBFS-based file and achieving 876 MB/s aggregate throughput. And, no, there is no cache involved.


$ for n in r1 r2 r3 r4; do ssh $n time dd if=`pwd`/all_card_trans.ul of=/dev/null bs=1M &; done
[5] 13325
[6] 13326
[7] 13327
[8] 13328
$ 32458+1 records in
32458+1 records
34034910300 bytes (34 GB) copied, 154.117 seconds, 221 MB/s

real    2m34.127s
user    0m0.014s
sys     0m3.073s
32458+1 records in
32458+1 records out
34034910300 bytes (34 GB) copied, 155.113 seconds, 219 MB/s

real    2m35.123s
user    0m0.020s
sys     0m3.127s
32458+1 records in
32458+1 records out
34034910300 bytes (34 GB) copied, 155.813 seconds, 218 MB/s

real    2m35.821s
user    0m0.026s
sys     0m3.210s
32458+1 records in
32458+1 records out
34034910300 bytes (34 GB) copied, 155.89 seconds, 218 MB/s

real    2m35.901s
user    0m0.017s
sys     0m3.039s

With Exadata in mind, the idea is to offer a comprehensive solution for data warehousing. All too often I see data loading claims that start with the flat files sort of magically appearing ready to be loaded. Oh no, we don’t think that way. The data is outside on a provider system somewhere and has to be staged in advance of ETL/ELT. Since DBFS exploits the insane bandwidth of Exadata, it is an extremely good data staging solution. The data has to be rapidly ingested into the staging area and then rapidly loaded. A bottleneck on either part of that equation will be your weakest link.

Just think, no external systems required for data staging. No additional storage connectivity, administration, tuning, etc.

And, yes, it can do more than a single dd process on each node! Much more.

Exciting stuff.

6 Responses to “Staging Data For ETL/ELT? Flat Files Appear Magically! No, Load Time Starts With Transfer Time.”


  1. 1 Ofir June 18, 2009 at 7:47 am

    wow! that is very exciting news… linux clustered file system on top of SecureFiles… that could be an extremely powerful tool. I can actually think of several projects I’ve seen that could have improved their architecture with such magic. We sure live in interesting times :)
    Too bad the web cast will be in nine in the evening here – I have previous engagements…

    • 2 kevinclosson June 18, 2009 at 2:45 pm

      Ofir,

      Have you noticed I don’t call it a “linux clustered file system?” It is distributed, scalable and fully coherent in a clustered environment, but I’m not calling it a CFS.

  2. 3 Peter June 18, 2009 at 10:34 am

    Hi,

    You said:
    “…,as our HP Oracle Exadata Storage Server and HP Oracle Database Machine customers know based on their latest software upgrade, a FUSE-based Oracle-backed file system is a reality.”

    Upgrade of what software? Exadata?

    Thanks for the clarification

    regds
    /Peter

  3. 5 Philip Papadopoulos June 18, 2009 at 1:36 pm

    Absolutely fantastic! Just get buy-in from the storage admins, the server admins, and those DW types that think in-side the box and like slow by slow.

    • 6 kevinclosson June 18, 2009 at 2:43 pm

      Philip,

      I don’t understand the “slow by slow” part of your comment. As for buy-in, there should be none required since the storage admins don’t have to lift a finger when Exadata is involved and server admins aren’t involved with DBFS either. After installation, DBFS is fully administered by the DBA. That includes space provisioning, file system creation/deletion, mounting/unmounting, etc.


Leave a Reply




Disclaimer

The views expressed on this blog are my own and do not reflect the views of Oracle Corporation. The views and opinions expressed by visitors on this blog are theirs, not mine.
All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.
website metrics