Client Selection for Federated Policy Optimization with Environment
Heterogeneity
- URL: http://arxiv.org/abs/2305.10978v5
- Date: Tue, 20 Feb 2024 10:47:47 GMT
- Title: Client Selection for Federated Policy Optimization with Environment
Heterogeneity
- Authors: Zhijie Xie, S.H. Song
- Abstract summary: Policy Iteration (PI) has inspired many recent algorithms for Reinforcement Learning (RL)
This paper investigates the federated version of Approximate PI (API) and derives its error bound.
We propose a client selection algorithm to alleviate the additional approximation error.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of Policy Iteration (PI) has inspired many recent algorithms
for Reinforcement Learning (RL), including several policy gradient methods that
gained both theoretical soundness and empirical success on a variety of tasks.
The theory of PI is rich in the context of centralized learning, but its study
under the federated setting is still in the infant stage. This paper
investigates the federated version of Approximate PI (API) and derives its
error bound, taking into account the approximation error introduced by
environment heterogeneity. We theoretically prove that a proper client
selection scheme can reduce this error bound. Based on the theoretical result,
we propose a client selection algorithm to alleviate the additional
approximation error caused by environment heterogeneity. Experiment results
show that the proposed algorithm outperforms other biased and unbiased client
selection methods on the federated mountain car problem and the Mujoco Hopper
problem by effectively selecting clients with a lower level of heterogeneity
from the population distribution.
Related papers
- Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Wasserstein Distributionally Robust Policy Evaluation and Learning for
Contextual Bandits [18.982448033389588]
Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment.
To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed.
We propose a novel DRO approach that employs the Wasserstein distance instead.
arXiv Detail & Related papers (2023-09-15T20:21:46Z) - Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related
Rewards [5.347237827669861]
We study the piecewise stationary semi-bandit problem with causally related rewards.
In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process.
The problem becomes aggravated in the semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms.
arXiv Detail & Related papers (2023-07-26T12:06:13Z) - Federated Compositional Deep AUC Maximization [58.25078060952361]
We develop a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score.
To the best of our knowledge, this is the first work to achieve such favorable theoretical results.
arXiv Detail & Related papers (2023-04-20T05:49:41Z) - Adaptive Federated Learning via New Entropy Approach [14.595709494370372]
Federated Learning (FL) has emerged as a prominent distributed machine learning framework.
In this paper, we propose an adaptive FEDerated learning algorithm based on ENTropy theory (FedEnt) to alleviate the parameter deviation among heterogeneous clients.
arXiv Detail & Related papers (2023-03-27T07:57:04Z) - Federated Learning as Variational Inference: A Scalable Expectation
Propagation Approach [66.9033666087719]
This paper extends the inference view and describes a variational inference formulation of federated learning.
We apply FedEP on standard federated learning benchmarks and find that it outperforms strong baselines in terms of both convergence speed and accuracy.
arXiv Detail & Related papers (2023-02-08T17:58:11Z) - Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.
We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles.
Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z) - Variance-Reduced Heterogeneous Federated Learning via Stratified Client
Selection [31.401919362978017]
We propose a novel stratified client selection scheme to reduce the variance for the pursuit of better convergence and higher accuracy.
We present an optimized sample size allocation scheme by considering the diversity of stratum's variability.
Experimental results confirm that our approach not only allows for better performance relative to state-of-the-art methods but also is compatible with prevalent FL algorithms.
arXiv Detail & Related papers (2022-01-15T05:41:36Z) - A Theorem of the Alternative for Personalized Federated Learning [19.499120576896228]
We show how excess risks of personalized federated learning depend on data heterogeneity from a minimax point of view.
Our results show that the presumably difficult (infinite-dimensional) problem of adapting to client-wise heterogeneity can be reduced to a simple binary decision problem.
arXiv Detail & Related papers (2021-03-02T17:58:20Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.