SARSA and Q-learning >>> What is true about Bellman equations >>> Practical Reinforcement Learning
1.
Question 1
What is true about Bellman equations?
1 point
SARSA is based on Bellman optimality equation.
Q-learning is based on Bellman expectation equation.
SARSA is based on Bellman expectation equation.
———————————————————————————————–
2.
Question 2
What is true about the targets (aka the goals) for approximate qq-function for different algorithms: SARSA, Expected SARSA, Q-learning?
As usual, we write SS for the state, AA for the action, RR for the reward, S’S’ for the next state and A’A’ for the action chosen from that next state.
1 point
The Q-learning target computation requires the probability of current policy to select the A’A′ in S’S′, where A’A′ is the action that was actually made in the environment.
SARSA and Q-learning targets differ only in how A’ in S’ is selected.
The expected SARSA target has higher variance compared to the SARSA target.
For the Q-learning target (unlike for the SARSA one) we need an explicit policy to sample A’ from.
All methods (SARSA, Expected SARSA, Q-learning) require RR and S’S’ to perform updates (but some of these methods may also require other inputs).
———————————————————————————————–
3.
Question 3
When SARSA is better than Expected SARSA?
1 point
In the cases when the \gammaγ is too large.
In the cases when we have a lot of parameters W.
In the cases when the state space is too large, so that we cannot integrate approximations over huge state space.
In the cases when we have only a few parameters W.
In the cases when it is impossible to compute an explicit expectation over policy stochasticity.
In the cases when the action space is too large, so that we cannot integrate approximations over huge action space.
———————————————————————————————–
4.
Question 4
Select the correct statements about approximate (based on function approximation) SARSA and Q-learning.
1 point
Both algorithms can use same neural network architectures for approximating the QQ function.
Both algorithms use the regression loss (e.g. MSE, MAE, etc.)
Both algorithms use the classification loss (e.g. accuracy, log loss, etc.)
The algorithms differ only in a form of update (more precisely, only in the target expression).