Offline DRL
(This is not to be confused with off-policy DRL)
Given a batch of data ${ (s,a,r,s') }$, find a good policy without interacting with the environment.
Issue: consider target for data point $(s,a,r,s')$, for some $a'$, $Q_{targ}(s',a')$ is very big, but over data set, does have some any action near $(s',a')$
Popular algorithms: IQL, ICL
Meta DRL
We have already learned policies for $\pi_{\theta_1}, \pi_{\theta_2}, ..., \pi_{\theta_n}$ for $N$ different tasks $\tau_1, \tau_2, ..., \tau_n$. The goal is to find an initial $\theta$ which can be quickly adapted to any new tasks.
Popular algorithms: MAML (not only for RL)
Multi-Agent Reinforcement Learning
- Agents are non-cooperative
- Game theory, Nash equilibrium
- Cooperative, there's a central controller
- Multi-dimensional action space
Training Robots
Sim-Real: simulate real environment, so this can help generating episodes.