Federated Learning for Heterogeneous Bandits with Unobserved Contexts
- URL: http://arxiv.org/abs/2303.17043v2
- Date: Tue, 30 Jan 2024 00:43:40 GMT
- Title: Federated Learning for Heterogeneous Bandits with Unobserved Contexts
- Authors: Jiabin Lin and Shana Moothedath
- Abstract summary: We study the problem of federated multi-arm contextual bandits with unknown contexts.
We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of federated stochastic multi-arm contextual bandits
with unknown contexts, in which M agents are faced with different bandits and
collaborate to learn. The communication model consists of a central server and
the agents share their estimates with the central server periodically to learn
to choose optimal actions in order to minimize the total regret. We assume that
the exact contexts are not observable and the agents observe only a
distribution of the contexts. Such a situation arises, for instance, when the
context itself is a noisy measurement or based on a prediction mechanism. Our
goal is to develop a distributed and federated algorithm that facilitates
collaborative learning among the agents to select a sequence of optimal actions
so as to maximize the cumulative reward. By performing a feature vector
transformation, we propose an elimination-based algorithm and prove the regret
bound for linearly parametrized reward functions. Finally, we validated the
performance of our algorithm and compared it with another baseline approach
using numerical simulations on synthetic data and on the real-world movielens
dataset.
Related papers
- Multi-Agent Best Arm Identification in Stochastic Linear Bandits [0.7673339435080443]
We study the problem of collaborative best-arm identification in linear bandits under a fixed-budget scenario.
In our learning model, we consider multiple agents connected through a star network or a generic network, interacting with a linear bandit instance in parallel.
We devise the algorithms MaLinBAI-Star and MaLinBAI-Gen for star networks and generic networks respectively.
arXiv Detail & Related papers (2024-11-20T20:09:44Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints [0.0]
We propose a distributed upper confidence bound (UCB) algorithm, related-UCB.
Our algorithm constructs a pruned action set during each round to ensure the constraints are met.
We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.
arXiv Detail & Related papers (2024-01-21T18:43:55Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context.
This setting is motivated by a wide range of applications where the true context for decision-making is unobserved.
We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z) - Distributed Stochastic Bandit Learning with Context Distributions [0.0]
We study the problem of distributed multi-arm contextual bandit with unknown contexts.
In our model, an adversary chooses a distribution on the set of possible contexts and the agents observe only the context distribution and the exact context is unknown to the agents.
Our goal is to develop a distributed algorithm that selects a sequence of optimal actions to maximize the cumulative reward.
arXiv Detail & Related papers (2022-07-28T22:00:11Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.