BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary
Contextual Bandits
- URL: http://arxiv.org/abs/2307.03587v2
- Date: Wed, 19 Jul 2023 13:23:29 GMT
- Title: BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary
Contextual Bandits
- Authors: Nicklas Werge, Abdullah Akg\"ul, Melih Kandemir
- Abstract summary: We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for contextual linear bandits in non-stationary environments.
This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings.
- Score: 16.59103967569845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound
(BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary
environments. This unique combination of Bayesian and frequentist principles
enhances adaptability and performance in dynamic settings. The BOF-UCB
algorithm utilizes sequential Bayesian updates to infer the posterior
distribution of the unknown regression parameter, and subsequently employs a
frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing
the expected reward over the posterior distribution. We provide theoretical
guarantees of BOF-UCB's performance and demonstrate its effectiveness in
balancing exploration and exploitation on synthetic datasets and classical
control tasks in a reinforcement learning setting. Our results show that
BOF-UCB outperforms existing methods, making it a promising solution for
sequential decision-making in non-stationary environments.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization [35.74766507227412]
We propose FT-PFN, a novel surrogate for Freeze-thaw style BO.
FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability.
We show that, when combined with our novel acquisition mechanism (I-random), the resulting in-context freeze-thaw BO method (ifBO) yields new state-of-the-art performance.
arXiv Detail & Related papers (2024-04-25T17:40:52Z) - LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions [0.0]
We propose the LABCAT algorithm, which extends trust-region-based BO.
We show that the algorithm outperforms several state-of-the-art BO and other black-box optimization algorithms.
arXiv Detail & Related papers (2023-11-19T13:56:24Z) - Model-based Causal Bayesian Optimization [74.78486244786083]
We introduce the first algorithm for Causal Bayesian Optimization with Multiplicative Weights (CBO-MW)
We derive regret bounds for CBO-MW that naturally depend on graph-related quantities.
Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system.
arXiv Detail & Related papers (2023-07-31T13:02:36Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Model-based Causal Bayesian Optimization [78.120734120667]
We propose model-based causal Bayesian optimization (MCBO)
MCBO learns a full system model instead of only modeling intervention-reward pairs.
Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form.
arXiv Detail & Related papers (2022-11-18T14:28:21Z) - Neighbor Regularized Bayesian Optimization for Hyperparameter
Optimization [12.544312247050236]
We propose a novel BO algorithm called Neighbor Regularized Bayesian Optimization (NRBO) to solve the problem.
We first propose a neighbor-based regularization to smooth each sample observation, which could reduce the observation noise efficiently without any extra training cost.
We conduct experiments on the bayesmark benchmark and important computer vision benchmarks such as ImageNet and COCO.
arXiv Detail & Related papers (2022-10-07T12:08:01Z) - Batch Bayesian optimisation via density-ratio estimation with guarantees [26.052368583196426]
We present a theoretical analysis of BORE's regret and an extension of the algorithm with improved uncertainty estimates.
We also show that BORE can be naturally extended to a batch optimisation setting by recasting the problem as approximate Bayesian inference.
arXiv Detail & Related papers (2022-09-22T00:42:18Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Inferential Induction: A Novel Framework for Bayesian Reinforcement
Learning [6.16852156844376]
We describe a novel framework, Inferential Induction, for correctly inferring value function distributions from data.
We experimentally demonstrate that the proposed algorithm is competitive with respect to the state of the art.
arXiv Detail & Related papers (2020-02-08T06:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.