Customizable Scalable Compute Intensive Stream Queries
GSDM is a data stream management system running on cluster computers. The system is extensible through user-defined data representations and computations. The computations are specified as stream queries continuously computed over windows of data sliding over the streams.
Our applications include a virtual radio telescope requiring advanced computations over huge streams of data. The system needs to be highly scalable w.r.t. both data and computations. Rather than providing only built-in distribution strategies, the system allows the user to define distribution templates to specify customized distribution strategies for user functions in stream queries. The distribution templates are shown to provide scalability for stream computations that grow more expensive with larger windows.
Distribution templates can be defined in terms of other distribution templates, enabling specification of large distribution patterns of communicating computing nodes. This also allows optimizing templates that generate new templates based on profiling the computations.
July 11, 2006