Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature
- URL: http://arxiv.org/abs/2102.04168v1
- Date: Mon, 8 Feb 2021 12:41:56 GMT
- Title: Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature
- Authors: Kefan Dong, Jiaqi Yang, Tengyu Ma
- Abstract summary: We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
- Score: 61.22680308681648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies model-based bandit and reinforcement learning (RL) with
nonlinear function approximations. We propose to study convergence to
approximate local maxima because we show that global convergence is
statistically intractable even for one-layer neural net bandit with a
deterministic reward. For both nonlinear bandit and RL, the paper presents a
model-based algorithm, Virtual Ascent with Online Model Learner (ViOL), which
provably converges to a local maximum with sample complexity that only depends
on the sequential Rademacher complexity of the model class. Our results imply
novel global or local regret bounds on several concrete settings such as linear
bandit with finite or sparse model class, and two-layer neural net bandit. A
key algorithmic insight is that optimism may lead to over-exploration even for
two-layer neural net model class. On the other hand, for convergence to local
maxima, it suffices to maximize the virtual return if the model can also
reasonably predict the size of the gradient and Hessian of the real return.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Oracle Inequalities for Model Selection in Offline Reinforcement
Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z) - Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice.
We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models.
We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Asymptotic Risk of Overparameterized Likelihood Models: Double Descent
Theory for Deep Neural Networks [12.132641563193582]
We investigate the risk of a general class of overvisibilityized likelihood models, including deep models.
We demonstrate that several explicit models, such as parallel deep neural networks and ensemble learning, are in agreement with our theory.
arXiv Detail & Related papers (2021-02-28T13:02:08Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Learning with tree tensor networks: complexity estimates and model
selection [33.10033799230797]
We analyze a complexity-based model selection method for tree tensor networks in an empirical risk minimization framework.
We show that our strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces.
arXiv Detail & Related papers (2020-07-02T14:52:08Z) - Learning of Discrete Graphical Models with Neural Networks [15.171938155576566]
We introduce NeurISE, a neural net based algorithm for graphical model learning.
NeurISE is seen to be a better alternative to GRISE when the energy function of the true model has a high order.
We also show a variant of NeurISE that can be used to learn a neural net representation for the full energy function of the true model.
arXiv Detail & Related papers (2020-06-21T23:34:01Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.