# SAXually Explicit Images: Data Mining Large Shape Databases

Google TechTalks
May 12, 2006
Eamonn Keogh
ABSTRACT
The problem of indexing large collections of time series and images has received much attention in the last decade, however we argue that there is potentially great untapped utility in data mining such collections. Consider the following two concrete examples of problems in data mining.
Motif Discovery (duplication detection): Given a large repository of time series or images, find approximately repeated patterns/images.
Discord Discovery: Given a large repository of time series or images, find the most unusual time series/image.
As we will show, both these problems have applications in fields as diverse as anthropology, crime prevention, zoology and entertainment. Both problems are trivial to solve given time quadratic in the number of objects, but only a linear time solution is tractable for realistic problems. In this talk we will show how a symbolic representation of the data call SAX (Symbolic Aggregate ApproXimation) allows fast, scalable solutions to these problems.