Oracle Inequalities for Model Selection in Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2211.02016v1
- Date: Thu, 3 Nov 2022 17:32:34 GMT
- Title: Oracle Inequalities for Model Selection in Offline Reinforcement
Learning
- Authors: Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
- Abstract summary: We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
- Score: 105.74139523696284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline reinforcement learning (RL), a learner leverages prior logged data
to learn a good policy without interacting with the environment. A major
challenge in applying such methods in practice is the lack of both
theoretically principled and practical tools for model selection and
evaluation. To address this, we study the problem of model selection in offline
RL with value function approximation. The learner is given a nested sequence of
model classes to minimize squared Bellman error and must select among these to
achieve a balance between approximation and estimation error of the classes. We
propose the first model selection algorithm for offline RL that achieves
minimax rate-optimal oracle inequalities up to logarithmic factors. The
algorithm, ModBE, takes as input a collection of candidate model classes and a
generic base offline RL algorithm. By successively eliminating model classes
using a novel one-sided generalization test, ModBE returns a policy with regret
scaling with the complexity of the minimally complete model class. In addition
to its theoretical guarantees, it is conceptually simple and computationally
efficient, amounting to solving a series of square loss regression problems and
then comparing relative square loss between classes. We conclude with several
numerical simulations showing it is capable of reliably selecting a good model
class.
Related papers
- LoRA Unlearns More and Retains More (Student Abstract) [0.0]
PruneLoRA reduces the need for large-scale parameter updates by applying low-rank updates to the model.
We leverage LoRA to selectively modify a subset of the pruned model's parameters, thereby reducing the computational cost, memory requirements and improving the model's ability to retain performance on the remaining classes.
arXiv Detail & Related papers (2024-11-16T16:47:57Z) - Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [59.875550175217874]
We show that a simple Model-based Reinforcement Learning scheme achieves strong regret and sample bounds in online and offline RL settings.
We highlight that our algorithms are simple, fairly standard, and indeed have been extensively studied in the RL literature.
arXiv Detail & Related papers (2024-08-16T19:52:53Z) - In-Context Unlearning: Language Models as Few Shot Unlearners [27.962361828354716]
We propose a new class of unlearning methods for Large Language Models (LLMs)
This method unlearns instances from the model by simply providing specific kinds of inputs in context, without the need to update model parameters.
Our experimental results demonstrate that in-context unlearning performs on par with, or in some cases outperforms other state-of-the-art methods that require access to model parameters.
arXiv Detail & Related papers (2023-10-11T15:19:31Z) - An improved column-generation-based matheuristic for learning
classification trees [9.07661731728456]
Decision trees are highly interpretable models for solving classification problems in machine learning (ML)
Standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy.
citefirat 2020column proposed a column-generation-based approach for learning decision trees.
arXiv Detail & Related papers (2023-08-22T14:43:36Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - On Statistical Efficiency in Learning [37.08000833961712]
We address the challenge of model selection to strike a balance between model fitting and model complexity.
We propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce cost.
Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
arXiv Detail & Related papers (2020-12-24T16:08:29Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.