# Non-Myopic Active Learning: A Reinforcement Learning Approach

Active learning considers the problem of actively choosing the training data. This is particularly useful in settings where the training data is limited or comes with a price and therefore the learner needs to be "economical" in its data usage. Active learning can be particularly challenging in settings where the cost of the data varies, the learner only has partial control over the data it receives and the value of each data point depends on the information captured by the training data already received. In such situations, non-myopic strategies that take into account the long-term effects of each data selection are desirable. In this talk, I will describe how non-myopic active learning can be naturally formulated as a reinforcement learning problem. This formulation is particularly useful to deal with the exploration exploitation dilemma that arises when the learner hesitates between selecting data that minimizes the immediate cost (exploitation) and selecting data that maximizes the long-term information gain (exploration). I will describe a Bayesian approach to optimally tradeoff exploitation and exploration. I will also show how to derive an analytic solution for discrete problems and an algorithm called BEETLE.

Presented by Pascal Poupart, University of Waterloo

*Google Tech TalkMarch 16, 2009*