On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments
- URL: http://arxiv.org/abs/2409.03897v2
- Date: Sat, 20 Sep 2025 01:01:49 GMT
- Title: On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments
- Authors: Muxing Wang, Pengkun Yang, Lili Su,
- Abstract summary: We study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations.<n>We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$.<n>We prove that, for a wide range of stepsizes, the $ell_infty$ norm of the error cannot decay faster than $Theta (E/T)$.
- Score: 13.674509321097311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale multi-agent systems are often deployed across wide geographic areas, where agents interact with heterogeneous environments. There is an emerging interest in understanding the role of heterogeneity in the performance of the federated versions of classic reinforcement learning algorithms. In this paper, we study synchronous federated Q-learning, which aims to learn an optimal Q-function by having $K$ agents average their local Q-estimates per $E$ iterations. We observe an interesting phenomenon on the convergence speeds in terms of $K$ and $E$. Similar to the homogeneous environment settings, there is a linear speed-up concerning $K$ in reducing the errors that arise from sampling randomness. Yet, in sharp contrast to the homogeneous settings, $E>1$ leads to significant performance degradation. Specifically, we provide a fine-grained characterization of the error evolution in the presence of environmental heterogeneity, which decay to zero as the number of iterations $T$ increases. The slow convergence of having $E>1$ turns out to be fundamental rather than an artifact of our analysis. We prove that, for a wide range of stepsizes, the $\ell_{\infty}$ norm of the error cannot decay faster than $\Theta (E/T)$. In addition, our experiments demonstrate that the convergence exhibits an interesting two-phase phenomenon. For any given stepsize, there is a sharp phase-transition of the convergence: the error decays rapidly in the beginning yet later bounces up and stabilizes. Provided that the phase-transition time can be estimated, choosing different stepsizes for the two phases leads to faster overall convergence.
Related papers
- Convergence of Shallow ReLU Networks on Weakly Interacting Data [5.618969269882913]
We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points.
We show that a network with width of order $log(n)$ neurons suffices for global convergence with high probability.
arXiv Detail & Related papers (2025-02-24T09:07:14Z) - Single-Loop Federated Actor-Critic across Heterogeneous Environments [9.276123988094698]
We study textitSingle-loop Federated Actor Critic (SFAC) where agents perform actor-critic learning in a two-level federated manner.
We show that the convergence error of SFAC converges to a near-stationary point, with the extent proportional to environment.
arXiv Detail & Related papers (2024-12-19T06:13:59Z) - Dissipative phase transition: from qubits to qudits [0.0]
We investigate the fate of dissipative phase transitions in quantum many-body systems when the individual constituents are qudits instead of qubits.
Considering qudits instead of qubits opens new perspectives on accessing rich phase diagrams in open many-body systems.
arXiv Detail & Related papers (2024-05-02T12:08:28Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Over-Parameterization Exponentially Slows Down Gradient Descent for
Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate.
These two bounds jointly give an exact characterization of the convergence rate.
We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z) - A globally convergent fast iterative shrinkage-thresholding algorithm
with a new momentum factor for single and multi-objective convex optimization [0.0]
We show that our proposed method also has a global convergence rate of $O(1/k2)$ for any $(a,b)$.
We report numerical results with various $(a,b)$, showing that some of these choices give better results than the classical momentum factors.
arXiv Detail & Related papers (2022-05-11T04:26:00Z) - Faster Convergence of Local SGD for Over-Parameterized Models [1.5504102675587357]
Modern machine learning architectures are often highly expressive.
We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized functions in heterogeneous data setting.
For general convex loss functions, we establish an error bound $O(K/T)$ otherwise.
For non-loss functions, we prove an error bound $O(K/T)$ in both cases.
We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize.
arXiv Detail & Related papers (2022-01-30T04:05:56Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - An Analysis of the Adaptation Speed of Causal Models [80.77896315374747]
Recently, Bengio et al. conjectured that among all candidate models, $G$ is the fastest to adapt from one dataset to another.
We investigate the adaptation speed of cause-effect SCMs using convergence rates from optimization.
Surprisingly, we find situations where the anticausal model is advantaged, falsifying the initial hypothesis.
arXiv Detail & Related papers (2020-05-18T23:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.