I speak to a lot of customers, prospects and co-workers about Exadata. Even though Exadata has been in production for two years I still do not presume everyone has a grasp of some of the more important fundamentals of Exadata. I’ll routinely get asked about how very large SGA buffering can enhance Exadata Smart Scan or how Storage Indexes might improve OLTP workloads and other such non sequiturs.
There are a lot of sessions about Exadata being offered at Oracle OpenWorld 2010 and for good reason. Exadata is exciting technology! It dawns on me, however, that a few words explaining some of the more fundamental aspects of Exadata might help folks absorb more of what they are hearing in the sessions they attend next week.
I consider the following seven terms and definitions utterly important for folks to know before sitting through an Exadata presentation. In fact, there may even be some sessions offered by presenters who could also benefit from the following 242 words?
- Cell Offload Processing.
- Work performed by the Storage Servers that would otherwise have to be executed in the database grid. Includes functionality like Smart Scan, datafile initialization, RMAN offload, Hybrid Columnar Compression (HCC) decompression.
- Smart Scan.
- Most relevant Cell Offload Processing for improving Data Warehouse / Business Intelligence query performance. Smart Scan is the agent for offloading filtration, projection, Storage Index exploitation and HCC decompression.
- Full Scan or Index Fast Full Scan.
- The required access method chosen by the query optimizer in order to trigger a Smart Scan.
- Direct Path Reads.
- Required buffering model for a Smart Scan. The flow of data from a Smart Scan cannot be buffered in the SGA buffer pool. Direct path reads can be performed for both serial and parallel queries. Direct path reads are buffered in process PGA (heap).
- Result Set.
- Data returned by the SQL processing layer. The SQL processing layer is in the Oracle Database. The data flowing from a Smart Scan is not a result set.
- Exadata Smart Flash Cache.
- Flash Cache in each of the Storage Servers. Not to be confused with Database Flash Cache which is Flash in the database grid and not compatible with Exadata. Smart Scan aggressively scans both HDD and Flash media concurrently. When data is present in the flash cache scan rates of 50 GB/s on Exadata Version 2 hardware are the norm for full rack configurations. Maximum theoretical scan rates (a.k.a., datasheet scan rates) for Exadata are *only* possible for fully offloaded scans. A fully offloaded scan is generated by a SQL query that finds no rows. Blog Update: Please consider viewing the following 2 minute Youtube video with a demonstration of how complex SQL processing throttles Exadata Smart Scan to roughly 10% of maximum theoretical scans rates:http://www.youtube.com/watch?v=JuWVjSp42yM
- Storage Index.
- Dynamic, in-memory indexes. The role of Storage Index technology is not to aid in locating data faster but instead to eliminate I/O. With Storage Indexes the Exadata Storage Server software can determine whether or not a given storage region contains rows relevant to the query and decide to not read the storage region. Storage Indexes are only examined during a Smart Scan.
I hope you’ll find this helpful.
Indeed these are very valuable points, at least to me as I am new to exadata.
Can you show your blog post in the rss as full text article? or can you just post a vote as Jonathan Lewis do .
http://jonathanlewis.wordpress.com/2010/08/30/subscribers/
yep
I got it. Thank you.
Kevin, thanks for the above. Could you clarify one point, please? All 7 concepts seem to revolve around smart scan, and smart scan is defined as “the most relevant offloading process for improving *DW/BI* query performance”.
Is there a good reason you specify “DW/BI query” or can “IO-intensive query” be substituted. Based on the descriptions of each technology, I would assume yes, but I’m just wondering if I’m missing something, especially in light of the history of Exadata as particularly relevant for DW-type processing.
Thanks!
Hi Daniel,
Quite simple. At this time Offload Processing is not optimized for transactional workloads. Transactional workloads generally get rows by ROWID or do very short small table scans neither of which get a boost from offload processing.
I do suppose I/O-intensive would be an acceptable substitution, so long as the access method is FULL and the buffering is direct (so, not scattered reads). Am I still clear as mud? 😦
Hi, Kevin, thanks – that makes sense. I’m not sure what magic I was hoping for, but I guess something like the optimizer, when realizing that lots of reads are gonna happen, starts a smart scan. I guess there is some hope with the FFS…
Hi Daniel,
Remember that the product of a smart scan cannot go into the SGA. People (everywhere!) routinely forget that fundamental concept. If you consider the SGA critical to your OLTP/ERP then put all that in perspective 🙂
I would like to subscribe your blog.
Hi Kevin,
You have mentioned in above context:-
“Smart Scan aggressively scans both HDD and Flash media concurrently”… but I think SMART SCAN is anti FLASH.:)
Smart Scans ignores FLASH and scan only disk. But if object is created/altered with CELL_FLASH_CACHE=KEEP, than only Smart scans will use flash and disk.
Please correct me if I am wrong here. I know I am raising question to someone who is master in this. And we all are still getting knowledge by reading your blog and your “comments” on other’s blog.:)
I think there must be some type of “RSS” which I can use for your blog and for the comments you are raising in other’s blog.
Regards,
Sunil Bhola
Yes, one must KEEP the object to get max theoretical scan rates. If the scanned object doesn’t fit in the aggregate of flash cache then don’t KEEP it. When scanning only HDD (High performance drives) full-rack scan throughput drops from 100GB/s to 25GB/s as per the datasheet.