On Provable Benefits of Muon in Federated Learning
- URL: http://arxiv.org/abs/2510.03866v1
- Date: Sat, 04 Oct 2025 16:27:09 GMT
- Title: On Provable Benefits of Muon in Federated Learning
- Authors: Xinwen Zhang, Hongchang Gao,
- Abstract summary: The experiments recently introduced, Muon, has gained increasing attention due to its superior performance across a wide range of applications.<n>This paper investigates this federated performance of Muon in the unexplored setting of the Fedon learning algorithm.
- Score: 23.850171320924574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recently introduced optimizer, Muon, has gained increasing attention due to its superior performance across a wide range of applications. However, its effectiveness in federated learning remains unexplored. To address this gap, this paper investigates the performance of Muon in the federated learning setting. Specifically, we propose a new algorithm, FedMuon, and establish its convergence rate for nonconvex problems. Our theoretical analysis reveals multiple favorable properties of FedMuon. In particular, due to its orthonormalized update direction, the learning rate of FedMuon is independent of problem-specific parameters, and, importantly, it can naturally accommodate heavy-tailed noise. The extensive experiments on a variety of neural network architectures validate the effectiveness of the proposed algorithm.
Related papers
- Federated Stochastic Minimax Optimization under Heavy-Tailed Noises [23.850171320924574]
We propose two algorithms::-NSGDA, which integrates bounded gradients, and Mu-DA, for local updates.<n>Both algorithms are designed to effectively address heavy-tailed noise in minimax federated, under a milder condition.<n>To the best of our knowledge, these are the first minimax optimization algorithms with rigorous theoretical guarantees undertailed noise.
arXiv Detail & Related papers (2025-11-06T15:27:29Z) - NorMuon: Making Muon more efficient and scalable [71.49702449498085]
We propose NorMuon (Neuron-wise Normalized Muon) as a successor to Adam.<n>We show NorMuon consistently outperforms both Adam and Muon, achieving 21.74% better training efficiency than Adam and 11.31% improvement over Muon on 1.1 B pretraining setting.
arXiv Detail & Related papers (2025-10-07T01:13:41Z) - DeMuon: A Decentralized Muon for Matrix Optimization over Graphs [20.832302616074966]
DeMuon is a method for decentralized matrix optimization over a given communication topology.<n>We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity.
arXiv Detail & Related papers (2025-10-01T19:06:11Z) - FedMuon: Federated Learning with Bias-corrected LMO-based Optimization [36.00641661700195]
We study how Muon can be utilized in federated learning.<n>We demonstrate that FedMuon can outperform the state-of-the-art federated learning methods.
arXiv Detail & Related papers (2025-09-30T14:45:12Z) - Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential [9.699640804685629]
We introduce the Fourier Multi-Component and Multi-Layer Neural Network (FMMNN), a novel model that creates a strong synergy between them.<n>We demonstrate that FMMNNs are highly effective and flexible in modeling high-frequency components.<n>We also analyze the optimization landscape of FMMNNs and find it to be much more favorable than that of standard fully connected neural networks.
arXiv Detail & Related papers (2025-02-26T09:12:52Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - Federated Compositional Deep AUC Maximization [58.25078060952361]
We develop a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score.
To the best of our knowledge, this is the first work to achieve such favorable theoretical results.
arXiv Detail & Related papers (2023-04-20T05:49:41Z) - Improving the Robustness of Neural Multiplication Units with Reversible
Stochasticity [2.4278445972594525]
Multilayer Perceptrons struggle to learn certain simple arithmetic tasks.
Specialist neural NMU (sNMU) is proposed to apply reversibleity, encouraging avoidance of such optima.
arXiv Detail & Related papers (2022-11-10T14:56:37Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - FedU: A Unified Framework for Federated Multi-Task Learning with
Laplacian Regularization [15.238123204624003]
Federated multi-task learning (FMTL) has emerged as a natural choice to capture the statistical diversity among the clients in federated learning.
To unleash the FMTL beyond statistical diversity, we formulate a new FMTL FedU using Laplacian regularization.
arXiv Detail & Related papers (2021-02-14T13:19:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.