I go through the slides of Sergey Levine’s RL course and record several questions about RL in this post. This post is not expected to be a thorough summarization about RL but several helpful Q&As that give us insights into RL.

What’s the difference between Q-learning and gradient descent?

Q-learning is fixed point iteration.

Why model-based RL has no guarantee that better model = better policy?

TBD

How to choose \(\lambda\) in GAE?

TBD

Is n-step DQN theoretically sound? Can we fix the issue with importance sampling?

TBD