Exploration >>> Which of the following is true about regret >>> Practical Reinforcement Learning

1.

Question 1

Which of the following is true about regret?

As a reminder, regret is what you could have obtained but didn’t — or, to put it more formally, the difference between the expected cumulative return of an optimal policy and the actual sum of rewards you got.

1 point

Larger regret means that the policy is better at exploration.

Smaller regret means that the policy is better at exploration.

At any given moment in time, a better exploration strategy will have lower regret.

——————————————————————————————–

2.

Question 2

Which of the following is true about the $ε$-greedy strategy?

1 point

With constant $ε$, $ε$-greedy exploration has linearly growing regret.

With constant $ε$, $ε$-greedy exploration has logarithmic regret.

If $t$ is the total number of actions taken and you set $ε=t1 $, an $ε$-greedy strategy will reach optimal policy in the limit.

If $t$ is the total number of actions taken and you set $ε=max(0,1−1000t )$, an $ε$-greedy strategy will reach optimal policy.

——————————————————————————————–

3.

Question 3

Which of the following is true about uncertainty-based exploration?

1 point

In case of a simple multi-armed bandit, Thompson Sampling has asymptotically smaller regret than an $ε$-greedy strategy with $ε=0.5$.

UCB has linear regret if the percentile is constant over time.

UCB works better than $ε$-greedy strategy in any decision process.

In some cases, epsilon-greedy strategy with $ε=0.2$ can sometimes have smaller regret than Thompson Sampling by 100-th action.