A Theoretical Framework of Almost Hyperparameter-free Hyperparameter
Selection Methods for Offline Policy Evaluation
- URL: http://arxiv.org/abs/2201.02300v1
- Date: Fri, 7 Jan 2022 02:23:09 GMT
- Title: A Theoretical Framework of Almost Hyperparameter-free Hyperparameter
Selection Methods for Offline Policy Evaluation
- Authors: Kohei Miyaguchi
- Abstract summary: offline reinforcement learning (OPE) is a core technology for data-driven decision optimization without environment simulators.
We introduce a new approximate hyper parameter selection framework for OPE, which defines a notion of optimality (called selection criteria) in a quantitative and interpretable manner.
We derive four AHS methods each of which has different characteristics such as convergence rate and time complexity.
- Score: 2.741266294612776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We are concerned with the problem of hyperparameter selection of offline
policy evaluation (OPE). OPE is a key component of offline reinforcement
learning, which is a core technology for data-driven decision optimization
without environment simulators. However, the current state-of-the-art OPE
methods are not hyperparameter-free, which undermines their utility in
real-life applications. We address this issue by introducing a new approximate
hyperparameter selection (AHS) framework for OPE, which defines a notion of
optimality (called selection criteria) in a quantitative and interpretable
manner without hyperparameters. We then derive four AHS methods each of which
has different characteristics such as convergence rate and time complexity.
Finally, we verify effectiveness and limitation of these methods with a
preliminary experiment.
Related papers
- Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL [28.31529154045046]
We propose a new approach to tune hyperparameters from offline logs of data.
We first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model.
We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.
arXiv Detail & Related papers (2022-05-18T04:26:23Z) - The Role of Adaptive Optimizers for Honest Private Hyperparameter
Selection [12.38071940409141]
We show that standard composition tools outperform more advanced techniques in many settings.
We draw upon limiting behaviour of Adam in the DP setting to design a new and more efficient tool.
arXiv Detail & Related papers (2021-11-09T01:56:56Z) - Towards Hyperparameter-free Policy Selection for Offline Reinforcement
Learning [10.457660611114457]
We show how to select between policies and value functions produced by different training algorithms in offline reinforcement learning.
We use BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari.
arXiv Detail & Related papers (2021-10-26T20:12:11Z) - Online Hyperparameter Meta-Learning with Hypergradient Distillation [59.973770725729636]
gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization.
We propose a novel HO method that can overcome these limitations, by approximating the second-order term with knowledge distillation.
arXiv Detail & Related papers (2021-10-06T05:14:53Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Hyperparameter Selection for Offline Reinforcement Learning [61.92834684647419]
offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios.
Existing hyperparameter selection methods for offline RL break the offline assumption.
arXiv Detail & Related papers (2020-07-17T15:30:38Z) - Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary
Strategies [41.13416324282365]
We propose a framework which entails the application of Evolutionary Strategies to online hyper- parameter tuning in off-policy learning.
Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces.
arXiv Detail & Related papers (2020-06-13T03:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.