Online Model Selection for Reinforcement Learning with Function
Approximation
- URL: http://arxiv.org/abs/2011.09750v1
- Date: Thu, 19 Nov 2020 10:00:54 GMT
- Title: Online Model Selection for Reinforcement Learning with Function
Approximation
- Authors: Jonathan N. Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma
Brunskill
- Abstract summary: We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
- Score: 50.008542459050155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning has achieved impressive successes yet often
requires a very large amount of interaction data. This result is perhaps
unsurprising, as using complicated function approximation often requires more
data to fit, and early theoretical results on linear Markov decision processes
provide regret bounds that scale with the dimension of the linear
approximation. Ideally, we would like to automatically identify the minimal
dimension of the approximation that is sufficient to encode an optimal policy.
Towards this end, we consider the problem of model selection in RL with
function approximation, given a set of candidate RL algorithms with known
regret guarantees. The learner's goal is to adapt to the complexity of the
optimal algorithm without knowing it \textit{a priori}. We present a
meta-algorithm that successively rejects increasingly complex models using a
simple statistical test. Given at least one candidate that satisfies
realizability, we prove the meta-algorithm adapts to the optimal complexity
with $\tilde{O}(L^{5/6} T^{2/3})$ regret compared to the optimal candidate's
$\tilde{O}(\sqrt T)$ regret, where $T$ is the number of episodes and $L$ is the
number of algorithms. The dimension and horizon dependencies remain optimal
with respect to the best candidate, and our meta-algorithmic approach is
flexible to incorporate multiple candidate algorithms and models. Finally, we
show that the meta-algorithm automatically admits significantly improved
instance-dependent regret bounds that depend on the gaps between the maximal
values attainable by the candidates.
Related papers
- Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Submodular Maximization subject to a Knapsack Constraint: Combinatorial
Algorithms with Near-optimal Adaptive Complexity [13.416309759182635]
We obtain the first constant approximation for non-monotone submodular algorithms with near-optimal $O(log n)$ adaptive complexity.
Our algorithm asks $tildeO(n2)$ value queries, but can be modified to run with only $tildeO(n)$ instead.
This is also the first approach with sublinear adaptive complexity for the problem and yields comparable to the state-of-the-art even for special cases of cardinality constraints or monotone objectives.
arXiv Detail & Related papers (2021-02-16T18:15:51Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs
with Near-optimal Regret [44.374427255708135]
We propose Exploration Enhanced Q-learning (EE-QL), a model-free algorithm for infinite-horizon average-reward Markov Decision Processes (MDPs)
EE-QL assumes that an online concentrating approximation of the optimal average reward is available.
This is the first model-free learning algorithm that achieves $O(sqrt T)$ regret without the ergodic assumption, and matches the lower bound in terms of $T$ except for logarithmic factors.
arXiv Detail & Related papers (2020-06-08T05:09:32Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.