# Learning Low Dimensional Manifolds

Google Tech Talk

October 9, 2009

ABSTRACT

Presented by Yoav Freund, UCSD.

Many read-world datasets can be characterized as follows: the "extrinsic dimension" of the data is high, but the "intrinsic

dimension" is low. Consider for example the data generated by a motion capture device. Such a device typically tracks a few hundred dots located on a special suit worn by the tracked person. Each time point corresponds to a vector consisting of the (x,y,z) location of each dot. The extrinsic dimension of these vectors is thus around one thousand. However, the vectors are highly constrained because the dots are placed on a human body that has only a limited number of degrees of freedom. We say that the "intrinsic dimension" is the number of the degrees of freedom of the data.

We are interested in learning algorithms whose performance scales with the intrinsic dimension of the data. We present the random projection trees algorithm which has this type of performance. Moreover, the algorithm is very efficient computationally and can be performed in a streaming fashion where each data point is seen only once.

This is joint work with Sanjoy Dasgupta.