Using upper confidence bounds to control exploration and exploitation

Posted in Science on September 13, 2008


Using upper confidence bounds to control exploration and exploitation

Lecture slides:

  • Using upper confidence bounds to control exploration and exploitation
  • Contents
  • Exploration vs. Exploitation
  • Exploration vs. Exploitation: Some Applications
  • Bandit Problems – “Optimism in the Face Uncertainty”
  • Parametric Bandits [Lai&Robbins]
  • Bounds
  • UCB1 Algorithm (Auer et al., 2002)
  • TITLE
  • Bandits in Continuous Time
  • Formal framework
  • Evaluating allocation rules (policies)
  • Gain, action values and regret
  • Model-based UCB
  • Algorithm
  • Regret bound
  • Key proposition
  • Open problems
  • Levente Kocsis Remi Munos
  • Bandits with large action-spaces
  • Structure helps!
  • UCT Upper Confidence based Tree search
  • Example (t=1)
  • Example (t=2)
  • Example (t=3)
  • Example (t=4)
  • What is the next time a \t\tsuboptimal action is sampled?
  • UCT variations
  • UCT variations
  • Theoretical results
  • Planning in MDPs: Sailing
  • Planning in MDPs: Sailing
  • Planning in MDPs: Sailing
  • Results in games
  • Thank you!

Author: Csaba Szepesvari, Department Of Computing Science, University Of Alberta

Watch Video

Tags: Science, Lectures, Computer Science, Machine Learning, VideoLectures.Net, Active Learning