Revisit Reinforcement Learning
I go through the slides of Sergey Levine’s RL course and record several questions about RL in this post. This post is not expected to be a thorough summarization about RL but several helpful Q&As that give us insights into RL.
What’s the difference between Q-learning and gradient descent?
Q-learning is fixed point iteration.
Why model-based RL has no guarantee that better model = better policy?
TBD
How to choose \(\lambda\) in GAE?
TBD
Is n-step DQN theoretically sound? Can we fix the issue with importance sampling?
TBD