Playing Atari with Deep Reinforcement Learning

Introduction

Challenges

  1. Deep learning applications needs labeled data; reinforcement learning must learn from rewards that are sparse, noisy and delayed.
  2. DL assumes that each sample is independent, while RL has many correlated states.
  3. Data distribution in RL changes with the learning process, while DL doesn't.

This paper demostrates that CNN can overcome these chanllenges to learn from raw video data in complex RL environments.

Background

Notations:

  • $\mathcal{E}$: environment
  • $a_t \in \mathcal{A}$: action at timestep $t$, the action space
  • $x_t$: current screen (represented by vector of pixels)
  • $s_t = x_1, a_1, ..., x_{t-1}, a_{t-1}, x_t$: current states
  • Use discounted return: $R_t = \sum_{t'=t}^T{\gamma^{t'-t}r_{t'}}$
  • $Q^\ast = \max_\pi \mathbb{E}[R_t | s_t=s, a_t=a]$