Surge 2011 ~ Hybrid data storage: finding balance.
Over the past several years Clearspring has developed custom distributed processing and storage systems for dealing with the billions of views our web products receive per day. A central part of this system is a tree-based storage structure that fills a useful middle ground between the datamodel-centric view of row oriented databases and the query-centric view more common with column oriented ones. This presentation will cover the key components of our architecture and some of the trade-offs we have faced between: sharding/scatter-gather approaches versus more sophisticated distributed approaches, statistical approximation versus exact answers, custom solutions versus adapting off-the-shelf components, and latency of updates versus just about everything. In particular we hope to share how these trade-offs have changed and how we have adapted over time.