Related papers: Metalearning Linear Bandits by Prior Update

Metalearning Linear Bandits by Prior Update

URL: http://arxiv.org/abs/2107.05320v1
Date: Mon, 12 Jul 2021 11:17:01 GMT
Title: Metalearning Linear Bandits by Prior Update
Authors: Amit Peleg, Naama Pearl and Ron Meir
Abstract summary: Fully Bayesian approaches assume that problem parameters are generated from a known prior, while in practice, such information is often lacking. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of an algorithm that uses the true prior.
Score: 7.519872646378836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior, while in practice, such information is often lacking, and needs to be estimated through learning. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of the algorithm that uses the true prior. Next, we address the task of learning the prior through metalearning, where a learner updates its estimate of the prior across multiple task instances in order to improve performance on future tasks. The estimated prior is then updated within each task based on incoming observations, while actions are selected in order to maximize expected reward. In this work we apply this scheme within a linear bandit setting, and provide algorithms and regret bounds, demonstrating its effectiveness, as compared to an algorithm that knows the correct prior. Our results hold for a broad class of algorithms, including, for example, Thompson Sampling and Information Directed Sampling.

Related papers

Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data. Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box. In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z)
Time-Varying Gaussian Process Bandits with Unknown Prior [18.93478528448966]
PE-GP-UCB is capable of solving time-varying Bayesian optimisation problems. It relies on the fact that either the observed function values are consistent with some of the priors.
arXiv Detail & Related papers (2024-02-02T18:52:16Z)
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data. We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification. We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z)
Minimalistic Predictions to Schedule Jobs with Online Precedence Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints. An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z)
Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements. We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design. We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z)
The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function. We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z)
Early Classification of Time Series. Cost-based Optimization Criterion and Algorithms [0.0]
In this paper, we put forward a new optimization criterion which takes into account both the cost of misclassification and the cost of delaying the decision. We derived a family of non-myopic algorithms which try to anticipate the expected future gain in information in balance with the cost of waiting.
arXiv Detail & Related papers (2020-05-20T10:08:30Z)
Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.