Oracle Switches To Columnar Store Technology With Oracle Database 11g Release 2?

Published September 1, 2009 oracle 32 Comments

I see fellow OakTable Network member Tanel Poder has blogged that Oracle Database 11g Release 2 has switched to offer columnar store technology. Or at least one could infer that from the post.

I left a comment on Tanel’s blog but would like to make a quick entry here on the topic as well. Oracle Database 11g Release 2 does not offer column-store technology as thought of in the Vertica (for example) sense. The technology, available only with Exadata storage, is called Hybrid Columnar Compression. The word hybrid is important.

Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.

So, “hybrid” is the word. But, none of that matters as much as the effectiveness. This form of compression is extremely effective.

32 Responses to “Oracle Switches To Columnar Store Technology With Oracle Database 11g Release 2?”

Feed for this Entry Trackback Address

1 Andrew Gregovich September 2, 2009 at 4:24 am

Kevin

Is the restriction of HCC to Exadata due to technical reasons or is it a marketing ploy? if it’s the latter, seems like a bit of a gamble to me…

Regards

Andrew

Reply
- 2 kevinclosson September 2, 2009 at 4:31 am
  
  It is technical for reasons.
  
  Reply
3 Dani Rey September 2, 2009 at 7:02 am

Kevin,

the marketing guys at Oracle did exactly what you never liked in marketing guys (of other companies). They claim: DWH query performance is 10-fold with Exadata Storage Server and Database Machine. Full stop. 😉

You can see it on http://www.oracledatabase11g.com/Main/Questionnaire/EducationQuiz.html in the first question

Nevertheless, we all know that Oracle Exadata Storage Server is 485x Faster Than…Oracle Exadata Storage Server.

Regards
Dani

Reply
4 Tanel Poder September 2, 2009 at 1:33 pm

Hi Kevin,

Thanks for the correction! However the new features overview doc is somewhat misleading in this case..

“Hybrid columnar compression is a new method for organizing how data is stored. Instead of storing the data in traditional rows, the data is grouped, ordered and stored one column at a time.”

Reply
- 5 kevinclosson September 2, 2009 at 3:09 pm
  
  The Exadata documentation is much more clear. I don’t think anything is misleading. It’s just not perfectly clear. I see no problem with that though because HCC is not generically available as it is specific to Exadata. To that end, the Exadata documentation is the best place to go into the topic deeply. That’s just my opinion.
  
  Reply
6 Darryl September 2, 2009 at 3:32 pm

The “switch” to column storage is eluded to at the Structured Data bloag as well…

http://structureddata.org/2009/09/01/oracle-database-11g-release-2-is-now-generally-available/

Reply
- 7 kevinclosson September 2, 2009 at 5:41 pm
  
  Never a war between Greg and I …too good friends … there will be terminology missteps
  
  Reply
- 8 Greg Rahn September 3, 2009 at 8:11 am
  
  @Darryl
  
  Kevin and I are not in disagreement; it’s just my way of describing what is in the documentation (which I have quoted in my blog post).
  
  Reply
9 Darryl September 3, 2009 at 3:00 pm

Gents,
I wasn’t inferring any disagreement. I was just pointing out this topic was mentioned on another blog.
Thanks

Reply
10 Daniel Abadi September 3, 2009 at 3:10 pm

Kevin,

I’m quite confused by your post. You say:

“Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.”

What is “they” referring to in “they are stored in an object…”? Is it rows? If so, then what does it mean that “like values” are stored in the compression unit? “Like values” would seem to indicate values from the same column.

In short, it is unclear from your post if a compression unit contains multiple rows, but each column is stored separately inside the compression unit, or if it is just a bunch of data from one particular column.

Cheers,
Daniel Abadi
Assistant Professor, Yale University

Reply
- 11 kevinclosson September 3, 2009 at 4:13 pm
  
  Hi Daniel,
  
  The hybrid columnar compression feature is documented in the Exadata Storage Server product documentation. However, I’m not permitted to expose the contents of that documentation here. My main objective in following on to my friend Tanel Poder’s post was to ensure folks don’t run off with the idea that Oracle Database 11g Release 2 for Exadata is a columnar store database.
  
  I can answer some of your questions safely though. A compression unit is a collection of data blocks. Multiple rows are stored in a compression unit. Columns are stored separately within the compression unit. Likeness amongst the column values within a compression unit yields the space savings. There are still rowids (that change when a row is updated by the way) and row locks. It is a hybrid.
  
  So, in short, no a compression unit does not have values from only one column.
  
  I apologize for the confusing information. There are times I find myself trying to make bricks without straw as it were.
  
  Reply
12 Daniel Abadi September 3, 2009 at 4:24 pm

Thanks Kevin, that’s what I wanted to know.

I’m also curious if Exadata performs selection predicates directly on the HCC data, or if it is decompressed into regular tuples before the predicate is evaluated. However, if this is restricted information, I understand.

Cheers,
Daniel Abadi

Reply
- 13 kevinclosson September 3, 2009 at 4:39 pm
  
  Exadata performs filtration and projection (Smart Scan) on compressed data in Oracle Database 11g Release 2 just as it did in 11gR1. With the columnar compression, Smart Scans send uncompressed data to the database grid though.
  
  Reply
14 Ganesh September 3, 2009 at 6:31 pm

Hello:

Just one clarification..

So with HCC, not only will we have compression (and grouping) within the blocks, also there will be compression units that will have another grouping of
data based on the *likeness* of the column values among blocks?

Can you please explain?

Thanks
Ganesh

Reply
- 15 kevinclosson September 3, 2009 at 6:39 pm
  
  Well, “we” of course means Exadata customers. I do understand the curiosity. Think of the blocks in a compression unit as buckets for column values. Highly compressible columns wont take up much space in the “buckets” but non-compressible columns will.
  
  Reply
16 Daniel Abadi September 3, 2009 at 11:02 pm

Kevin,

Thanks for your responses. The information you shared was enough for me to take a stab at placing the Oracle scheme in my taxonomy of hybrid row/column storage schemes. Please see: http://dbmsmusings.blogspot.com/2009/09/tour-through-hybrid-columnrow-oriented.html

If you disagree with anything I said, please let me know.

Cheers,
Daniel Abadi

Reply
17 Sebastián September 4, 2009 at 4:23 pm

Why have you compared with Vertica instead of Sybase IQ? As far as I know Sybase IQ is much mature and also there are many great references!

Reply
- 18 kevinclosson September 4, 2009 at 4:37 pm
  
  Sebastian,
  
  It was a contrast, not a comparison and I’m not here to prop one or the other of those two companies. I’m not on either of their teams so I don’t care which of those win their piece of the pie.
  
  Reply
19 Rajan January 21, 2010 at 12:08 am

Kevin,

What is the size of the compression unit ? Is it the size of a physical I/O typically 1 MB ?

Thanks
Rajan

Reply
- 20 kevinclosson January 21, 2010 at 4:03 pm
  
  Rajan,
  
  CU size varies. It is either 32KB or 64KB depending on the type of compression and other factors.
  
  Reply
21 Gaurav Mittal September 3, 2012 at 10:14 am

Column Store is getting very critical in DWH environments and its all about comressing data. Please have a look on the below article

[ link removed ]

Reply
- 22 kevinclosson September 3, 2012 at 11:27 am
  
  @Guarav : Columnar orientation is not “getting critical” it is critical. I’m in R&D with EMC’s Greenplum I do indeed know the value of columnar vis a vis compression efficacy. Note, I removed the link to your post. If you’d like feel free to send a trackback.
  
  Reply

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage