On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
- URL: http://arxiv.org/abs/2505.23459v1
- Date: Thu, 29 May 2025 14:08:35 GMT
- Title: On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
- Authors: Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines,
- Abstract summary: We introduce b-RS-FedPG, a novel policy gradient method that employs a carefully constructed softmax-inspired parameterization.<n>We demonstrate explicit convergence rates for b-RS-FedPG toward near-optimal stationary policies.
- Score: 14.366821866598803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring convergence of policy gradient methods in federated reinforcement learning (FRL) under environment heterogeneity remains a major challenge. In this work, we first establish that heterogeneity, perhaps counter-intuitively, can necessitate optimal policies to be non-deterministic or even time-varying, even in tabular environments. Subsequently, we prove global convergence results for federated policy gradient (FedPG) algorithms employing local updates, under a {\L}ojasiewicz condition that holds only for each individual agent, in both entropy-regularized and non-regularized scenarios. Crucially, our theoretical analysis shows that FedPG attains linear speed-up with respect to the number of agents, a property central to efficient federated learning. Leveraging insights from our theoretical findings, we introduce b-RS-FedPG, a novel policy gradient method that employs a carefully constructed softmax-inspired parameterization coupled with an appropriate regularization scheme. We further demonstrate explicit convergence rates for b-RS-FedPG toward near-optimal stationary policies. Finally, we demonstrate that empirically both FedPG and b-RS-FedPG consistently outperform federated Q-learning on heterogeneous settings.
Related papers
- Federated Reinforcement Learning in Heterogeneous Environments [9.944647907864255]
We investigate a Federated Reinforcement Learning with Environment Heterogeneity (FRL-EH) framework, where local environments exhibit statistical heterogeneity.<n>Within this framework, agents collaboratively learn a global policy by aggregating their collective experiences while preserving the privacy of their local trajectories.<n>We present a novel global objective function that ensures robust performance across heterogeneous local environments and their plausible perturbations.<n>We extend FedRQ to environments with continuous state space through the use of expectile loss, addressing the key challenge of minimizing a value function over a continuous subset of the state space.
arXiv Detail & Related papers (2025-07-19T05:06:38Z) - Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z) - Ordering-based Conditions for Global Convergence of Policy Gradient Methods [73.6366483406033]
We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation.<n>Overall, these observations call into question approximation error as an appropriate quantity for characterizing the global convergence of PG methods under linear function approximation.
arXiv Detail & Related papers (2025-04-02T21:06:28Z) - Policy Gradient for LQR with Domain Randomization [25.387541996071093]
Domain randomization (DR) enables sim-to-real transfer by training controllers on a distribution of simulated environments.<n>We provide the first convergence analysis of policy gradient (PG) methods for domain-randomized linear quadratic regulation (LQR)<n>We quantify the sample-complexity associated with achieving a small performance gap between the sample-average and population-level objectives.
arXiv Detail & Related papers (2025-03-31T17:51:00Z) - Convergence of Policy Mirror Descent Beyond Compatible Function Approximation [66.4260157478436]
We develop theoretical PMD general policy classes where we strictly assume a weaker variational dominance and obtain convergence to the best-in-class policy.<n>Our main notion leverages a novel notion induced by the local norm induced by the occupancy- gradient measure.
arXiv Detail & Related papers (2025-02-16T08:05:46Z) - Towards Fast Rates for Federated and Multi-Task Reinforcement Learning [34.34798425737858]
We propose Fast-FedPG, a novel federated policy algorithm with a carefully designed bias-correction mechanism.
Under a gradient-domination condition, we prove that our algorithm guarantees (i) fast linear convergence with exact gradients, and (ii) sub-linear rates that enjoy a linear speedup w.r.t. the number of agents with noisy, truncated policy gradients.
arXiv Detail & Related papers (2024-09-09T02:59:17Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments [17.995517050546244]
We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data.
We propose two algorithms: FedSVRPG-M and FedHAPG-M, which converge to a stationary point of the average performance function.
Our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.
arXiv Detail & Related papers (2024-05-29T20:24:42Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - The Role of Baselines in Policy Gradient Optimization [83.42050606055822]
We show that the emphstate value baseline allows on-policy.
emphnatural policy gradient (NPG) to converge to a globally optimal.
policy at an $O (1/t) rate gradient.
We find that the primary effect of the value baseline is to textbfreduce the aggressiveness of the updates rather than their variance.
arXiv Detail & Related papers (2023-01-16T06:28:00Z) - Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization.
We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z) - Fast Global Convergence of Natural Policy Gradient Methods with Entropy
Regularization [44.24881971917951]
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms.
We develop convergence guarantees for entropy-regularized NPG methods under softmax parameterization.
Our results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.
arXiv Detail & Related papers (2020-07-13T17:58:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.