Database Systems Pioneer Starts Database Company.

Required reading for anyone who is really serious about database internals is what I always called the “big Red Book”. Readings in Database System by Michael Stonebraker sits prominently in my personal library. I recommend it to anyone interested in the fundamentals. Hold it, I just realized that this Amazon link for the book is Amazon.ca and the price is $256.40 Loonies and only US $57.00 on this Amazon.com site. Wow, I bet Stonebraker would rather sell his stuff in Canada! Or then again, he might be too busy.

According to this Network World article, Michael Stonebraker has been leading a startup called Vertica Systems. Vertica is going to bring a new database system to market. Vertica has Jerry Held and Ray Lane in the head-shed. I just blogged about some interesting perspectives Ray Lane has on the “traditional software” model in my blog entry entitled Scalable NFS Powered by Open Source Cluster Filesystems. I recommend you check it out to see if those views seem congruous given Ray’s involvement in Vertica. Or is this stealth-news that Vertica is some amalgam of Open Source and “Ad-Revenue funded Software.” Either way, I think bringing yet another database management system to market would be quite the challenge. CA spun off Ingres to go do battle with Oracle recently as well. Doing battle with Oracle is a good way to wind up floating face down in murky water somewhere.

I also noted that the latest round of funding raised by Vertica included money from New Enterprise Associates. Hmm, My company, PolyServe, was also seeded by NEA. Maybe we are all one big happy family. After all, the Network World article states:

It is designed to handle data warehousing, business intelligence, fraud detection and other applications, even in environments with hundreds of terabytes of data.

PolyServe customers build tremendously large clusters for unstructured and structured data alike. I know of one of our Oil and Gas customers that has more than one cluster handling over 250TB but I digress…

We’ll have to keep an eye on Vertica; it is Michael Stonebraker after all!

2 Responses to “Database Systems Pioneer Starts Database Company.”

Feed for this Entry Trackback Address

1 Henry February 28, 2007 at 5:31 am

tried to post this yesterday but for some reason it didn’t work…

Ah, Vertica. We actually had Michael Stonebraker in about a year and a half ago to talk about (convince us to use) Vertica. I was in on that meeting and was less than impressed with the presentation. The product seems sort of interesting, the presentation just struck me as less than honest. That meeting was a while ago so I don’t remember a lot of the details, just what I put down in some sketchy notes. Also, I have not been following Vertica closely so I don’t know what changes have been made in the last year and a half.

There are a number of people here want to try it out (I would really like to do some testing wrt Oracle, time permitting. Hah. If anyone has some interesting tests in mind let me know, I’ll see what I have time for). We supposedly even have a copy to play with, though I don’t think much is happening in that direction as of yet. Of course I bet a lot of the attraction for all the MIT AI lab hackers here is having a product in beta, that isn’t big bad Oracle.

Vertica came out of c-store (http://db.csail.mit.edu/projects/cstore/vldb.pdf)
This product seems to be designed for DW and query intensive databases. Storage is by column, not tuple. This allows retrieval solely of the attributes you want instead of all other attributes also stored in the block with tuple based storage. The column values are also sorted and compressed. Multiple sorts can be stored to help querying speed.

There is a lot of redundancy in this design. It is a shared nothing architecture, with redundancy (at least as far as the data, not the sorting) across multiple nodes. It seems availability and recoverability are combined.

There is also a weird way of mixing in OLTP transactions. They are done in a separate work area and a tuple mover then puts them into the read store. There appears to be a time delay between committing data and having it visible from a query. Strange.

A few additional comments from my notes:
–It seems as if the High Availability (HA) and recoverability mechanisms are intertwined. Recoverability happens by accessing a replicated site. This implies a bunch of synchronous data transfer.
–If the above is true, I would really like to see query vs. write performance. At times it sounded as if the product was designed for improving DW queries, but M. Stonebraker also claimed OLTP writes were fine. He really pushed both. Even with multiple synchronous copies? And redundant data tables?
–We are reentering the shared nothing vs shared disk database cluster debates.
–A bit disingenuous about some of the references to “the elephants”. A lot of references were either 5 years old or referring to default behaviour. Granted, given the default setup,a standard table, and simple SQL Oracle will usually store in order entered (DELETES notwithstanding), but there is no requirement in the model to always do so. Of course this doesn’t mean Vertica can’t do some things better, just that we need to be educated consumers. (the disingenuousness, especially from an academic, made me slightly squirmy)
–I would still like to know more about backups.

Hey, just check my email, and M. Stonebraker will be giving a talk at my work Tuesday afternoon. Anybody have any questions for him?

Well that talk was today. If there is any interest I’ll post some notes/comments later. Too tired right now.

2 Phil Bowermaster May 2, 2007 at 3:45 am

There’s no question that any new venture with Dr. Stonebraker behind it is worth keeping an eye on. And, yes, bringing a new database to market will clearly represent significant challenges.

It is well established that the technology Vertica is proposing — a grid-enabled, column-oriented relational database — can provide a huge performance boost for data analytics. My company, Sybase, makes one such product, the Sybase IQ analytics server. It’s already available, with nearly 1,000 customers experiencing tremedous performance acceleration and significant ROI.

As an example: Nielsen Media Research implemented Sybase IQ for their audience data warehouse. Sybase IQ has provided 10-100x increase in response time for even the most complex of queries, along with a 70% compression ratio (which is allowing them to save quite a bit on hard storage.)

http://www.sybase.com/detail?id=1035802

As a whole, analytics servers -– both emerging products like SAND and enterprise-class products like Sybase IQ (which includes advanced features like encryption) — are experiencing very high growth. Dr. Stonebraker’s company could ride this wave of success, so it may not matter that their technology isn’t really new.

	David Zheng on Announcing pgio (The SLOB Meth…
	Oracle redo log perf… on File Systems For A Database? C…
	Oracle redo log perf… on Yes, File Systems Still Need T…
	kevinclosson on Announcing SLOB 2.5.4
	pgio nutzen? - I/O W… on So pgio Does Not Accurately Re…

Kevin Closson's Blog: Platforms, Databases and Storage