Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits
- URL: http://arxiv.org/abs/2302.09440v3
- Date: Mon, 8 Apr 2024 04:32:25 GMT
- Title: Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits
- Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee,
- Abstract summary: In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
- Score: 55.03293214439741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on the values of hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasible to use offline tuning methods like cross-validation to choose hyperparameters under the bandit environment, as the decisions should be made in real-time. To address this challenge, we propose the first online continuous hyperparameter tuning framework for contextual bandits to learn the optimal parameter configuration in practice within a search space on the fly. Specifically, we use a double-layer bandit framework named CDT (Continuous Dynamic Tuning) and formulate the hyperparameter optimization as a non-stationary continuum-armed bandit, where each arm represents a combination of hyperparameters, and the corresponding reward is the algorithmic result. For the top layer, we propose the Zooming TS algorithm that utilizes Thompson Sampling (TS) for exploration and a restart technique to get around the \textit{switching} environment. The proposed CDT framework can be easily utilized to tune contextual bandit algorithms without any pre-specified candidate set for multiple hyperparameters. We further show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
Related papers
- Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - A Framework for History-Aware Hyperparameter Optimisation in
Reinforcement Learning [8.659973888018781]
A Reinforcement Learning (RL) system depends on a set of initial conditions that affect the system's performance.
We propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs.
We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making.
arXiv Detail & Related papers (2023-03-09T11:30:40Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in
Contextual Bandit Algorithms [74.55200180156906]
The contextual bandit problem models the trade-off between exploration and exploitation.
We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
arXiv Detail & Related papers (2021-06-05T22:30:21Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Quantity vs. Quality: On Hyperparameter Optimization for Deep
Reinforcement Learning [7.559006677497745]
Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds.
We benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions.
arXiv Detail & Related papers (2020-07-29T05:12:34Z) - Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian
Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets.
We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z) - Weighted Random Search for CNN Hyperparameter Optimization [0.0]
We introduce the weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy.
The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values.
According to our experiments, the WRS algorithm outperforms the other methods.
arXiv Detail & Related papers (2020-03-30T09:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.