Hypothesis Transfer in Bandits by Weighted Models
- URL: http://arxiv.org/abs/2211.07387v1
- Date: Mon, 14 Nov 2022 14:13:02 GMT
- Title: Hypothesis Transfer in Bandits by Weighted Models
- Authors: Steven Bilaj, Sofien Dhouib, Setareh Maghsudi
- Abstract summary: We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning.
We show a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired.
We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step.
- Score: 8.759884299087835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of contextual multi-armed bandits in the setting of
hypothesis transfer learning. That is, we assume having access to a previously
learned model on an unobserved set of contexts, and we leverage it in order to
accelerate exploration on a new bandit problem. Our transfer strategy is based
on a re-weighting scheme for which we show a reduction in the regret over the
classic Linear UCB when transfer is desired, while recovering the classic
regret rate when the two tasks are unrelated. We further extend this method to
an arbitrary amount of source models, where the algorithm decides which model
is preferred at each time step. Additionally we discuss an approach where a
dynamic convex combination of source models is given in terms of a biased
regularization term in the classic LinUCB algorithm. The algorithms and the
theoretical analysis of our proposed methods substantiated by empirical
evaluations on simulated and real-world data.
Related papers
- Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - Coverage-Validity-Aware Algorithmic Recourse [23.643366441803796]
We propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts.
Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model.
We show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses.
arXiv Detail & Related papers (2023-11-19T15:21:49Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - Online Contextual Decision-Making with a Smart Predict-then-Optimize
Method [4.061135251278187]
We study an online contextual decision-making problem with resource constraints.
We propose an algorithm that mixes a prediction step based on the "Smart Predict-then- (SPO)" method with a dual update step based on mirror descent.
We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $mathcalO(T-1/2)$ convergence of online mirror descent.
arXiv Detail & Related papers (2022-06-15T06:16:13Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Composing Normalizing Flows for Inverse Problems [89.06155049265641]
We propose a framework for approximate inference that estimates the target conditional as a composition of two flow models.
Our method is evaluated on a variety of inverse problems and is shown to produce high-quality samples with uncertainty.
arXiv Detail & Related papers (2020-02-26T19:01:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.