Mining Relational Model Trees
Multi-Relational Data Mining (MRDM) refers to the process of discovering implicit, previously unknown and potentially useful information from data scattered in multiple tables of a relational database. MRDM is necessary to face the substantial complexity added to data mining tasks when properties of units of analysis to be investigated are potentially affected by attributes of related units of analysis eventually of different types and naturally modeled to yield as many tables as the number of object types. Regression is a fundamental task in MRDM where the goal is to examine samples of past experience with known continuous answers (response) and generalize future cases throughan inductive process. Following the mainstream of MRDM research, Mr-SMOTI resorts to the structural approach in order to recursively partition data stored in a tightly-coupled database and build a multi-relational model tree that captures the linear dependence between the response variable and one or more explanatory variables of both the reference objects and task-relevant objects. The model tree is top-down induced by choosing, at each step, either to partition the training space (split nodes) or to introduce a regression variable in the linear models to be associated with the leaves (regression nodes). Internal regression nodes contribute to the definition of multiple models and capture global effects, while straight-line regressions with leaves capture only local effects. The tight-coupling with the database makes the knowledge on data structures (e.g., foreign keys) available free of charge to guide the search in the multi-relational pattern space.
Author: Annalisa Appice, University Of Bari