Exploration >>> Which of the following is true about regret >>> Practical Reinforcement Learning
1.
Question 1
Which of the following is true about regret?
As a reminder, regret is what you could have obtained but didn’t — or, to put it more formally, the difference between the expected cumulative return of an optimal policy and the actual sum of rewards you got.
1 point
Larger regret means that the policy is better at exploration.
Smaller regret means that the policy is better at exploration.
At any given moment in time, a better exploration strategy will have lower regret.
——————————————————————————————–
2.
Question 2
Which of the following is true about the \varepsilonε-greedy strategy?
1 point
With constant \varepsilonε, \varepsilonε-greedy exploration has linearly growing regret.
With constant \varepsilonε, \varepsilonε-greedy exploration has logarithmic regret.
If tt is the total number of actions taken and you set \varepsilon = \frac 1 tε=t1, an \varepsilonε-greedy strategy will reach optimal policy in the limit.
If tt is the total number of actions taken and you set \varepsilon = \max \left( 0, 1 – \frac t {1000} \right)ε=max(0,1−1000t), an \varepsilonε-greedy strategy will reach optimal policy.
——————————————————————————————–
3.
Question 3
Which of the following is true about uncertainty-based exploration?
1 point
In case of a simple multi-armed bandit, Thompson Sampling has asymptotically smaller regret than an \varepsilonε-greedy strategy with \varepsilon = 0.5ε=0.5.
UCB has linear regret if the percentile is constant over time.
UCB works better than \varepsilonε-greedy strategy in any decision process.
In some cases, epsilon-greedy strategy with \varepsilon=0.2ε=0.2 can sometimes have smaller regret than Thompson Sampling by 100-th action.