Related papers: Hypothesis Transfer in Bandits by Weighted Models

Hypothesis Transfer in Bandits by Weighted Models

URL: http://arxiv.org/abs/2211.07387v1
Date: Mon, 14 Nov 2022 14:13:02 GMT
Title: Hypothesis Transfer in Bandits by Weighted Models
Authors: Steven Bilaj, Sofien Dhouib, Setareh Maghsudi
Abstract summary: We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning. We show a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired. We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step.
Score: 8.759884299087835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning. That is, we assume having access to a previously learned model on an unobserved set of contexts, and we leverage it in order to accelerate exploration on a new bandit problem. Our transfer strategy is based on a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired, while recovering the classic regret rate when the two tasks are unrelated. We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step. Additionally we discuss an approach where a dynamic convex combination of source models is given in terms of a biased regularization term in the classic LinUCB algorithm. The algorithms and the theoretical analysis of our proposed methods substantiated by empirical evaluations on simulated and real-world data.

Related papers

Amortized In-Context Bayesian Posterior Estimation [15.714462115687096]
Amortization, through conditional estimation, is a viable strategy to alleviate such difficulties. We conduct a thorough comparative analysis of amortized in-context Bayesian posterior estimation methods. We highlight the superiority of the reverse KL estimator for predictive problems, especially when combined with the transformer architecture and normalizing flows.
arXiv Detail & Related papers (2025-02-10T16:00:48Z)
Model-Robust and Adaptive-Optimal Transfer Learning for Tackling Concept Shifts in Nonparametric Regression [7.243632426715939]
We present a transfer learning procedure that is robust against model misspecification while adaptively attaining optimality. We derive the adaptive convergence rates of the excess risk for specifying Gaussian kernels in a prevalent class of hypothesis transfer learning algorithms.
arXiv Detail & Related papers (2025-01-18T20:33:37Z)
Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z)
The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset. Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model. However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail. We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z)
Coverage-Validity-Aware Algorithmic Recourse [23.643366441803796]
We propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model. We show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses.
arXiv Detail & Related papers (2023-11-19T15:21:49Z)
Aggregation Weighting of Federated Learning via Generalization Bound Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z)
Online Contextual Decision-Making with a Smart Predict-then-Optimize Method [4.061135251278187]
We study an online contextual decision-making problem with resource constraints. We propose an algorithm that mixes a prediction step based on the "Smart Predict-then- (SPO)" method with a dual update step based on mirror descent. We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $mathcalO(T-1/2)$ convergence of online mirror descent.
arXiv Detail & Related papers (2022-06-15T06:16:13Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Composing Normalizing Flows for Inverse Problems [89.06155049265641]
We propose a framework for approximate inference that estimates the target conditional as a composition of two flow models. Our method is evaluated on a variety of inverse problems and is shown to produce high-quality samples with uncertainty.
arXiv Detail & Related papers (2020-02-26T19:01:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.