Offline-to-online hyperparameter transfer for stochastic bandits
- URL: http://arxiv.org/abs/2501.02926v1
- Date: Mon, 06 Jan 2025 11:05:48 GMT
- Title: Offline-to-online hyperparameter transfer for stochastic bandits
- Authors: Dravyansh Sharma, Arun Sai Suggala,
- Abstract summary: We consider a practically relevant transfer learning setting where one has access to offline data collected from bandit problems.
We provide bounds on the inter-task (number of tasks) and intra-task (number of arm pulls for each task)
Our results apply to several classic algorithms, including the exploration parameters in UCB and LinUCB and the tuning noise parameter in GP-UCB.
- Score: 14.019191147891974
- License:
- Abstract: Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However, this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically relevant transfer learning setting where one has access to offline data collected from several bandit problems (tasks) coming from an unknown distribution over the tasks. Our aim is to use this offline data to set the hyperparameters for a new task drawn from the unknown distribution. We provide bounds on the inter-task (number of tasks) and intra-task (number of arm pulls for each task) sample complexity for learning near-optimal hyperparameters on unseen tasks drawn from the distribution. Our results apply to several classic algorithms, including tuning the exploration parameters in UCB and LinUCB and the noise parameter in GP-UCB. Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback.
Related papers
- Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function [24.457000214575245]
We introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance.
This can be used to show that the learning theoretic complexity of the corresponding family of utility functions is bounded.
arXiv Detail & Related papers (2025-01-23T15:10:51Z) - Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning [48.81121647322492]
textbfHyper is provably efficient under function approximation setting and empirically demonstrate its appealing performance and robustness in various environments.
textbfHyper extensively mitigates the problem by effectively regularizing the visitation of the exploration and decoupling the exploitation to ensure stable training.
arXiv Detail & Related papers (2024-12-04T23:12:41Z) - Revised Regularization for Efficient Continual Learning through Correlation-Based Parameter Update in Bayesian Neural Networks [20.00857639162206]
In continual learning scenarios, storing network parameters at each step to retain knowledge poses challenges.
Current methods using Variational Inference with KL divergence risk catastrophic forgetting during uncertain node updates.
We propose a parameter distribution learning method that significantly reduces the storage requirements.
arXiv Detail & Related papers (2024-11-21T15:11:02Z) - Efficient Hyperparameter Importance Assessment for CNNs [1.7778609937758323]
This paper aims to quantify the importance weights of some hyperparameters in Convolutional Neural Networks (CNNs) with an algorithm called N-RReliefF.
We conduct an extensive study by training over ten thousand CNN models across ten popular image classification datasets.
arXiv Detail & Related papers (2024-10-11T15:47:46Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity)
Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z) - Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for
Hyperparameter Recommendation [83.85021205445662]
We propose an instantiation--amortized auto-tuning (AT2) to speed up tuning of machine learning models.
We conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2)
arXiv Detail & Related papers (2021-06-17T00:01:18Z) - Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in
Contextual Bandit Algorithms [74.55200180156906]
The contextual bandit problem models the trade-off between exploration and exploitation.
We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
arXiv Detail & Related papers (2021-06-05T22:30:21Z) - Using a thousand optimization tasks to learn hyperparameter search
strategies [53.318615663332274]
We present TaskSet, a dataset of neural tasks for use in training and evaluating neurals.
TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets.
arXiv Detail & Related papers (2020-02-27T02:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.