Google Tech Talks: Supporting Scalable Online Statistical Processing
Query processing for analytic, statistical, and exploratory database queries has been an active area of database research and development for nearly two decades. Many experts now consider this problem to be "solved", especially with regard to performance. However, an argument can be made that users and databases have simply reached an uneasy truce with regard to analytic processing. If users avoid ad-hoc, exploratory queries that might take days to execute, then of course the database performs just fine.
In this talk, I will describe query processing in a database system called DBO that is designed from the ground up to support interactive analytic processing. DBO can run database queries from start to finish and produce exact answers in a scalable fashion. However, unlike any existing research or production system, DBO is able to produce statistically meaningful approximate answers at all times throughout query execution. These answers are continuously updated from start to finish, even for "huge" queries requiring arbitrary quantities of temporary secondary storage. Thus, a user can stop execution whenever satisfied with the query accuracy, which may translate to dramatic time savings during exploratory processing.
Google Tech Talks
April, 28 2008
Speaker: Christopher M. Jermaine - Research Scientist
Chris Jermaine is an assistant professor in the CISE Department at the University of Florida, where he studies databases and data management. He is the recipient of a 2008 Alfred P. Sloan Foundation Research Fellowship, a National Science Foundation CAREER award, and a 2007 ACM SIGMOD Best Paper Award. He received a BA from the Mathematics Department at UCSD, and a PhD from the College of Computing at Georgia Tech.