NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Real time data...
Big Learning Workshop: Algorithms, Systems, and Tools for Learning at Scale at NIPS 2011
Invited Talk: Real time data sketches by Alex Smola
Alex is a Principal Researcher at Yahoo. Alex's current research focus is on nonparametric methods for estimation, in particular kernel methods and exponential families. This includes support vector Machines, gaussian processes, and conditional random fields.
Abstract: I will describe a set of algorithms for extending streaming and sketching algorithms to real time analytics. These algorithm captures frequency information for streams of arbitrary sequences of symbols. The algorithm uses the Count-Min sketch as its basis and exploits the fact that the sketching operation is linear. It provides real time statistics of arbitrary events, e.g.\ streams of queries as a function of time. In particular, we use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. The service runs in real time, it scales perfectly in terms of throughput and accuracy, using distributed hashing. The latter also provides performance guarantees in the case of machine failure. Queries can be answered in constant time regardless of the amount of data to be processed. The same distribution techniques can also be used for heavy hitter detection in a distributed scalable fashion.