Related papers: MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation

MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation

URL: http://arxiv.org/abs/2603.00416v1
Date: Sat, 28 Feb 2026 02:32:44 GMT
Title: MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation
Authors: Rong Shan, Aofan Yu, Bo Chen, Kuo Cai, Qiang Luo, Ruiming Tang, Han Li, Weiwen Liu, Weinan Zhang, Jianghao Lin,
Abstract summary: MuonRec is the first framework that brings the proposed Muon iteration to RecSys training.<n>We develop an open-source training recipe for recommendation models and evaluate it across both traditional sequential recommenders and modern generative recommenders.
Score: 60.1890607252082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recommender systems (RecSys) are increasingly emphasizing scaling, leveraging larger architectures and more interaction data to improve personalization. Yet, despite the optimizer's pivotal role in training, modern RecSys pipelines almost universally default to Adam/AdamW, with limited scrutiny of whether these choices are truly optimal for recommendation. In this work, we revisit optimizer design for scalable recommendation and introduce MuonRec, the first framework that brings the recently proposed Muon optimizer to RecSys training. Muon performs orthogonalized momentum updates for 2D weight matrices via Newton-Schulz iteration, promoting diverse update directions and improving optimization efficiency. We develop an open-source training recipe for recommendation models and evaluate it across both traditional sequential recommenders and modern generative recommenders. Extensive experiments demonstrate that MuonRec reduces converged training steps by an average of 32.4\% while simultaneously improving final ranking quality. Specifically, MuonRec yields consistent relative gains in NDCG@10, averaging 12.6\% across all settings, with particularly pronounced improvements in generative recommendation models. These results consistently outperform strong Adam/AdamW baselines, positioning Muon as a promising new optimizer standard for RecSys training. Our code is available.

Related papers

NorMuon: Making Muon more efficient and scalable [71.49702449498085]
We propose NorMuon (Neuron-wise Normalized Muon) as a successor to Adam.<n>We show NorMuon consistently outperforms both Adam and Muon, achieving 21.74% better training efficiency than Adam and 11.31% improvement over Muon on 1.1 B pretraining setting.
arXiv Detail & Related papers (2025-10-07T01:13:41Z)
REG: A Regularization Optimizer for Robust Training Dynamics [24.850151895583494]
Row-and-Column-Scaling (RACS) operator regularizes the update steps in a less drastic manner, making it simpler to implement and more compatible with established training dynamics.<n>We demonstrate that our REG achieves superior performance and stability over AdamW, but also maintains consistency with the AdamW training paradigm.
arXiv Detail & Related papers (2025-10-04T06:05:57Z)
AdaMuon: Adaptive Muon Optimizer [11.281916426508216]
AdaMuon combines element-wise adaptivity with orthogonal updates for large-scale neural network training.<n>AdaMuon maintains stability but can surpass Adam by more than 40% training efficiency in large-scale scenarios.
arXiv Detail & Related papers (2025-07-15T05:49:37Z)
Muon is Scalable for LLM Training [50.68746986439438]
We introduce Moonlight, a Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon.<n>Our model improves the current frontier, achieving better performance with much fewer training FLOPs compared to prior models.<n>We open-source our distributed Muon implementation that is memory optimal and communication efficient.
arXiv Detail & Related papers (2025-02-24T09:12:29Z)
Preference Diffusion for Recommendation [50.8692409346126]
We propose PreferDiff, a tailored optimization objective for DM-based recommenders.<n> PreferDiff transforms BPR into a log-likelihood ranking objective to better capture user preferences.<n>It is the first personalized ranking loss designed specifically for DM-based recommenders.
arXiv Detail & Related papers (2024-10-17T01:02:04Z)
Narrowing the Focus: Learned Optimizers for Pretrained Models [24.685918556547055]
We propose a novel technique that learns a layer-specific linear combination of update directions provided by a set of base work tasks. When evaluated on an image, this specialized significantly outperforms both traditional off-the-shelf methods such as Adam, as well existing general learneds.
arXiv Detail & Related papers (2024-08-17T23:55:19Z)
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback [110.16220825629749]
Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models. In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts. Our findings indicate that all aspects are important for performance, with better preference data leading to the largest improvements.
arXiv Detail & Related papers (2024-06-13T16:17:21Z)
Improving Recommendation Fairness via Data Augmentation [66.4071365614835]
Collaborative filtering based recommendation learns users' preferences from all users' historical behavior data, and has been popular to facilitate decision making. A recommender system is considered unfair when it does not perform equally well for different user groups according to users' sensitive attributes. In this paper, we study how to improve recommendation fairness from the data augmentation perspective.
arXiv Detail & Related papers (2023-02-13T13:11:46Z)
Adaptive Optimization with Examplewise Gradients [23.504973357538418]
We propose a new, more general approach to the design of gradient-based optimization methods for machine learning. In this new framework, iterations assume access to a batch of estimates per parameter, rather than a single estimate. This better reflects the information that is actually available in typical machine learning setups.
arXiv Detail & Related papers (2021-11-30T23:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.