Privacy Preserving DataMining
July 28, 2006
Matthew Roughan joined the School of Applied Mathematics at the University of Adelaide in February 2004, where he is interested in the area of design, and installation of Internet measurement equipment, and the analysis and modeling of Internet measurement data.
The rapid growth of the Internet over the last decade has been startling. However, efforts to track its growth have often fallen afoul of bad data --- for instance, how much traffic does the Internet now carry? The problem is not that the data is technically hard to obtain, or that it does not exist, but rather that the data is not shared. Obtaining an overall picture requires data from multiple sources, few of whom are open to sharing such data, either because it violates privacy legislation, or exposes business secrets. The approaches used so far in the Internet, e.g., trusted third parties, or data anonymization, have been only partially successful, and are not widely adopted.
The paper presents a method for performing computations on shared data without any participants revealing their secret data. For example, one can compute the sum of traffic over a set of service providers without any service provider learning the traffic of another. The method is simple, scalable, and flexible enough to perform a wide range of valuable operations on Internet data.