Related papers: Federated Linear Dueling Bandits

Federated Linear Dueling Bandits

URL: http://arxiv.org/abs/2502.01085v1
Date: Mon, 03 Feb 2025 05:56:22 GMT
Title: Federated Linear Dueling Bandits
Authors: Xuhan Huang, Yan Hu, Zhiyan Li, Zhiyong Wang, Benyou Wang, Zhongxiang Dai,
Abstract summary: Contextual linear dueling bandits have recently garnered significant attention due to their widespread applications in important domains such as recommender systems and large language models.<n>Previous works have developed federated linear bandit algorithms which rely on closed-form updates of the bandit parameters to achieve collaboration.<n>In this work, we overcome this challenge through an innovative and principled combination of online gradient descent (for minimizing the loss function to estimate the linear function parameters) and federated learning, hence introducing the first federated linear dueling bandit algorithms.
Score: 28.74660004468591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual linear dueling bandits have recently garnered significant attention due to their widespread applications in important domains such as recommender systems and large language models. Classical dueling bandit algorithms are typically only applicable to a single agent. However, many applications of dueling bandits involve multiple agents who wish to collaborate for improved performance yet are unwilling to share their data. This motivates us to draw inspirations from federated learning, which involves multiple agents aiming to collaboratively train their neural networks via gradient descent (GD) without sharing their raw data. Previous works have developed federated linear bandit algorithms which rely on closed-form updates of the bandit parameters (e.g., the linear function parameter) to achieve collaboration. However, in linear dueling bandits, the linear function parameter lacks a closed-form expression and its estimation requires minimizing a loss function. This renders these previous methods inapplicable. In this work, we overcome this challenge through an innovative and principled combination of online gradient descent (for minimizing the loss function to estimate the linear function parameters) and federated learning, hence introducing the first federated linear dueling bandit algorithms. Through rigorous theoretical analysis, we prove that our algorithms enjoy a sub-linear upper bound on its cumulative regret. We also use empirical experiments to demonstrate the effectiveness of our algorithms and the practical benefit of collaboration.

Related papers

Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models [26.331324261505486]
Sequential Recommendation (SR) aims to leverage the sequential patterns in users' historical interactions to accurately track their preferences.<n>Despite the proven effectiveness of large language models (LLMs), their integration into commercial recommender systems is impeded.<n>We introduce a novel Pre-train, Align, and Disentangle (PAD) framework to enhance SR models with LLMs.
arXiv Detail & Related papers (2024-12-05T12:17:56Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Communication-Efficient Federated Non-Linear Bandit Optimization [26.23638987873429]
We propose a new algorithm, named Fed-GO-UCB, for federated bandit optimization with generic non-linear objective function. Under some mild conditions, we rigorously prove that Fed-GO-UCB is able to achieve sub-linear rate for both cumulative regret and communication cost.
arXiv Detail & Related papers (2023-11-03T03:50:31Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Transfer Learning with Partially Observable Offline Data via Causal Bounds [8.981637739384674]
In this paper, we investigate transfer learning in partially observable contextual bandits.<n>Agents operate with incomplete information and limited access to hidden confounders.<n>We propose an efficient method that discretizes the functional constraints of unknown distributions into linear constraints.<n>This method takes into account estimation errors and exhibits strong convergence properties, ensuring robust and reliable causal bounds.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback. We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z)
Last Switch Dependent Bandits with Monotone Payoff Functions [8.860629791560198]
We take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on payoffs, its approximation guarantee almost matches the state-of-the-art. We develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states.
arXiv Detail & Related papers (2023-06-01T04:38:32Z)
Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem. It is first algorithm that tackles row-wise distributed data in sparse linear bandits. It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z)
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration [64.8770356696056]
We propose GibbsDDRM, an extension of Denoising Diffusion Restoration Models (DDRM) to a blind setting in which the linear measurement operator is unknown. The proposed method is problem-agnostic, meaning that a pre-trained diffusion model can be applied to various inverse problems without fine-tuning.
arXiv Detail & Related papers (2023-01-30T06:27:48Z)
Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction. We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts. Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z)
PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt. We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Improved Branch and Bound for Neural Network Verification via Lagrangian Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks. We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB) BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z)
Scaling the Convex Barrier with Sparse Dual Algorithms [141.4085318878354]
We present two novel dual algorithms for tight and efficient neural network bounding. Both methods recover the strengths of the new relaxation: tightness and a linear separation oracle. We can obtain better bounds than off-the-shelf solvers in only a fraction of their running time.
arXiv Detail & Related papers (2021-01-14T19:45:17Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.