Statistical Modeling of Relational Data
KDD has traditionally been concerned with mining data from a single relation. However, most applications involve multiple interacting relations, either explicitly (in relational databases) or implicitly (in semi-structured and multimodal data). Examples include link analysis, social networks, bioinformatics, information extraction, security, ubiquitous computing, etc. Mining such data has become a topic of keen interest in the KDD community in recent years. The key difficulty is that data in relational domains is no longer i.i.d. (independent and identically distributed), greatly complicating statistical modeling. However, research has now advanced to the point where robust, easy-to-use, general-purpose techniques and languages for mining non-i.i.d. data are available. The goal of this tutorial is to add a sufficient subset of these concepts and techniques to the toolkits of both researchers and practitioners.
Author: Pedro Domingos, University Of Washington