Related papers: Federated Linear Bandits with Finite Adversarial Actions

Federated Linear Bandits with Finite Adversarial Actions

URL: http://arxiv.org/abs/2311.00973v1
Date: Thu, 2 Nov 2023 03:41:58 GMT
Title: Federated Linear Bandits with Finite Adversarial Actions
Authors: Li Fan, Ruida Zhou, Chao Tian, Cong Shen
Abstract summary: We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm. We prove that FedSupLinUCB achieves a total regret of $tildeO(sqrtd T)$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model.
Score: 20.1041278044797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of $\tilde{O}(\sqrt{d T})$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as $O(d M^2 \log(d)\log(T))$ and $O(\sqrt{d^3 M^3} \log(d))$, respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of $\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} \sigma_t^2})$ can be achieved with $\sigma_t^2$ being the noise variance of round $t$; and (2) adversarial corruption, where a total regret of $\tilde{O}(\sqrt{dT} + d C_p)$ can be achieved with $C_p$ being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of FedSupLinUCB on both synthetic and real-world datasets.

Related papers

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity [14.396304498754688]
We show a novel notion of sparsity that we dub $(lambda, beta)$-sparsity.<n>In short, there is a set of at most $beta$ groups whose risks at $theta$ all are at least $lambda$ larger than the risks of the other groups.<n>We show how to get a dimension-free semi-adaptive sample complexity bound with a computationally efficient method.
arXiv Detail & Related papers (2024-10-01T13:45:55Z)
Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. We develop a strategy that is effective in both adversarial environments, with theoretical guarantees. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z)
Corruption-Robust Offline Reinforcement Learning with General Function Approximation [60.91257031278004]
We investigate the problem of corruption in offline reinforcement learning (RL) with general function approximation. Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-10-23T04:07:26Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions. Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z)
A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments. We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.