## Offline DRL

(This is not to be confused with off-policy DRL)

Given a batch of data ${ (s,a,r,s') }$, find a good policy without interacting with the environment.

Issue: consider target for data point $(s,a,r,s')$, for some $a'$, $Q_{targ}(s',a')$ is very big, but over data set, does have some any action near $(s',a')$

Popular algorithms: IQL, ICL

## Meta DRL

We have already learned policies for $\pi_{\theta_1}, \pi_{\theta_2}, ..., \pi_{\theta_n}$ for $N$ different tasks $\tau_1, \tau_2, ..., \tau_n$. The goal is to find an initial $\theta$ which can be quickly adapted to any new tasks.

Popular algorithms: MAML (not only for RL)

## Multi-Agent Reinforcement Learning

- Agents are non-cooperative
- Game theory, Nash equilibrium

- Cooperative, there's a central controller
- Multi-dimensional action space

## Training Robots

Sim-Real: simulate real environment, so this can help generating episodes.