Hypothesis Transfer in Bandits by Weighted Models
- URL: http://arxiv.org/abs/2211.07387v1
- Date: Mon, 14 Nov 2022 14:13:02 GMT
- Title: Hypothesis Transfer in Bandits by Weighted Models
- Authors: Steven Bilaj, Sofien Dhouib, Setareh Maghsudi
- Abstract summary: We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning.
We show a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired.
We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step.
- Score: 8.759884299087835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of contextual multi-armed bandits in the setting of
hypothesis transfer learning. That is, we assume having access to a previously
learned model on an unobserved set of contexts, and we leverage it in order to
accelerate exploration on a new bandit problem. Our transfer strategy is based
on a re-weighting scheme for which we show a reduction in the regret over the
classic Linear UCB when transfer is desired, while recovering the classic
regret rate when the two tasks are unrelated. We further extend this method to
an arbitrary amount of source models, where the algorithm decides which model
is preferred at each time step. Additionally we discuss an approach where a
dynamic convex combination of source models is given in terms of a biased
regularization term in the classic LinUCB algorithm. The algorithms and the
theoretical analysis of our proposed methods substantiated by empirical
evaluations on simulated and real-world data.
Related papers
- Amortized In-Context Bayesian Posterior Estimation [15.714462115687096]
Amortization, through conditional estimation, is a viable strategy to alleviate such difficulties.
We conduct a thorough comparative analysis of amortized in-context Bayesian posterior estimation methods.
We highlight the superiority of the reverse KL estimator for predictive problems, especially when combined with the transformer architecture and normalizing flows.
arXiv Detail & Related papers (2025-02-10T16:00:48Z) - Model-Robust and Adaptive-Optimal Transfer Learning for Tackling Concept Shifts in Nonparametric Regression [7.243632426715939]
We present a transfer learning procedure that is robust against model misspecification while adaptively attaining optimality.
We derive the adaptive convergence rates of the excess risk for specifying Gaussian kernels in a prevalent class of hypothesis transfer learning algorithms.
arXiv Detail & Related papers (2025-01-18T20:33:37Z) - Coverage-Validity-Aware Algorithmic Recourse [21.642948522310782]
We propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts.
Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model.
We show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses.
arXiv Detail & Related papers (2023-11-19T15:21:49Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - Online Contextual Decision-Making with a Smart Predict-then-Optimize
Method [4.061135251278187]
We study an online contextual decision-making problem with resource constraints.
We propose an algorithm that mixes a prediction step based on the "Smart Predict-then- (SPO)" method with a dual update step based on mirror descent.
We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $mathcalO(T-1/2)$ convergence of online mirror descent.
arXiv Detail & Related papers (2022-06-15T06:16:13Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Composing Normalizing Flows for Inverse Problems [89.06155049265641]
We propose a framework for approximate inference that estimates the target conditional as a composition of two flow models.
Our method is evaluated on a variety of inverse problems and is shown to produce high-quality samples with uncertainty.
arXiv Detail & Related papers (2020-02-26T19:01:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.