I see fellow OakTable Network member Tanel Poder has blogged that Oracle Database 11g Release 2 has switched to offer columnar store technology. Or at least one could infer that from the post.
I left a comment on Tanel’s blog but would like to make a quick entry here on the topic as well. Oracle Database 11g Release 2 does not offer column-store technology as thought of in the Vertica (for example) sense. The technology, available only with Exadata storage, is called Hybrid Columnar Compression. The word hybrid is important.
Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.
So, “hybrid” is the word. But, none of that matters as much as the effectiveness. This form of compression is extremely effective.
Kevin
Is the restriction of HCC to Exadata due to technical reasons or is it a marketing ploy? if it’s the latter, seems like a bit of a gamble to me…
Regards
Andrew
It is technical for reasons.
Kevin,
the marketing guys at Oracle did exactly what you never liked in marketing guys (of other companies). They claim: DWH query performance is 10-fold with Exadata Storage Server and Database Machine. Full stop. 😉
You can see it on http://www.oracledatabase11g.com/Main/Questionnaire/EducationQuiz.html in the first question
Nevertheless, we all know that Oracle Exadata Storage Server is 485x Faster Than…Oracle Exadata Storage Server.
Regards
Dani
Hi Kevin,
Thanks for the correction! However the new features overview doc is somewhat misleading in this case..
“Hybrid columnar compression is a new method for organizing how data is stored. Instead of storing the data in traditional rows, the data is grouped, ordered and stored one column at a time.”
The Exadata documentation is much more clear. I don’t think anything is misleading. It’s just not perfectly clear. I see no problem with that though because HCC is not generically available as it is specific to Exadata. To that end, the Exadata documentation is the best place to go into the topic deeply. That’s just my opinion.
The “switch” to column storage is eluded to at the Structured Data bloag as well…
http://structureddata.org/2009/09/01/oracle-database-11g-release-2-is-now-generally-available/
Never a war between Greg and I …too good friends … there will be terminology missteps
@Darryl
Kevin and I are not in disagreement; it’s just my way of describing what is in the documentation (which I have quoted in my blog post).
Gents,
I wasn’t inferring any disagreement. I was just pointing out this topic was mentioned on another blog.
Thanks
Kevin,
I’m quite confused by your post. You say:
“Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.”
What is “they” referring to in “they are stored in an object…”? Is it rows? If so, then what does it mean that “like values” are stored in the compression unit? “Like values” would seem to indicate values from the same column.
In short, it is unclear from your post if a compression unit contains multiple rows, but each column is stored separately inside the compression unit, or if it is just a bunch of data from one particular column.
Cheers,
Daniel Abadi
Assistant Professor, Yale University
Hi Daniel,
The hybrid columnar compression feature is documented in the Exadata Storage Server product documentation. However, I’m not permitted to expose the contents of that documentation here. My main objective in following on to my friend Tanel Poder’s post was to ensure folks don’t run off with the idea that Oracle Database 11g Release 2 for Exadata is a columnar store database.
I can answer some of your questions safely though. A compression unit is a collection of data blocks. Multiple rows are stored in a compression unit. Columns are stored separately within the compression unit. Likeness amongst the column values within a compression unit yields the space savings. There are still rowids (that change when a row is updated by the way) and row locks. It is a hybrid.
So, in short, no a compression unit does not have values from only one column.
I apologize for the confusing information. There are times I find myself trying to make bricks without straw as it were.
Thanks Kevin, that’s what I wanted to know.
I’m also curious if Exadata performs selection predicates directly on the HCC data, or if it is decompressed into regular tuples before the predicate is evaluated. However, if this is restricted information, I understand.
Cheers,
Daniel Abadi
Exadata performs filtration and projection (Smart Scan) on compressed data in Oracle Database 11g Release 2 just as it did in 11gR1. With the columnar compression, Smart Scans send uncompressed data to the database grid though.
Hello:
Just one clarification..
So with HCC, not only will we have compression (and grouping) within the blocks, also there will be compression units that will have another grouping of
data based on the *likeness* of the column values among blocks?
Can you please explain?
Thanks
Ganesh
Well, “we” of course means Exadata customers. I do understand the curiosity. Think of the blocks in a compression unit as buckets for column values. Highly compressible columns wont take up much space in the “buckets” but non-compressible columns will.
Kevin,
Thanks for your responses. The information you shared was enough for me to take a stab at placing the Oracle scheme in my taxonomy of hybrid row/column storage schemes. Please see: http://dbmsmusings.blogspot.com/2009/09/tour-through-hybrid-columnrow-oriented.html
If you disagree with anything I said, please let me know.
Cheers,
Daniel Abadi
Why have you compared with Vertica instead of Sybase IQ? As far as I know Sybase IQ is much mature and also there are many great references!
Sebastian,
It was a contrast, not a comparison and I’m not here to prop one or the other of those two companies. I’m not on either of their teams so I don’t care which of those win their piece of the pie.
Kevin,
What is the size of the compression unit ? Is it the size of a physical I/O typically 1 MB ?
Thanks
Rajan
Rajan,
CU size varies. It is either 32KB or 64KB depending on the type of compression and other factors.
Column Store is getting very critical in DWH environments and its all about comressing data. Please have a look on the below article
[ link removed ]
@Guarav : Columnar orientation is not “getting critical” it is critical. I’m in R&D with EMC’s Greenplum I do indeed know the value of columnar vis a vis compression efficacy. Note, I removed the link to your post. If you’d like feel free to send a trackback.