# Cost-effective Outbreak Detection in Networks

Which blogs should we read to avoid missing important information? Where should we place sensors in a water distribution network to quickly detect contaminants? These seemingly different problems share common structure: Outbreak detection can be modeled as a problem of selecting nodes (blogs, sensor locations, ...) in a network, in order to detect the spreading of a virus or information as quickly as possible. We present a general methodology for near optimal sensor placement in these and related problems. We demonstrate that many realistic outbreak detection objectives (e.g., detection likelihood, population affected) exhibit the property of “submodularity’’. We exploit submodularity to develop an efficient algorithm that scales to large problems, provably achieving near optimal placements, while being 700 times faster than a simple greedy algorithm. We evaluate our approach on several large real-world problems, including a model of a water distribution network, and real blog data. We also show how the approach leads to deeper insights in both applications, answering multicriteria trade-off, cost-sensitivity and generalization questions. Joint work with: Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen and Natalie Glance Recepient of best student paper award at ACM SIGKDD ‘07 conference.

*Author: Jure Leskovec, Cmu*