Related papers: Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment

Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment

URL: http://arxiv.org/abs/2508.02929v2
Date: Wed, 06 Aug 2025 18:44:24 GMT
Title: Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment
Authors: Dai Li, Kevin Course, Wei Li, Hongwei Li, Jie Hua, Yiqi Chen, Zhao Zhu, Rui Jian, Xuan Cao, Bi Xue, Yu Shi, Jing Qian, Kai Ren, Matt Ma, Qunshu Zhang, Rui Li,
Abstract summary: We propose a framework designed for the development and deployment of hyperscale recommendation FMs.<n>In our approach, a central FM is trained on lifelong, cross-surface, multi-modal user data to learn generalizable knowledge.<n>This knowledge is then efficiently transferred to various lightweight, surface-specific "expert" models via target-aware embeddings.
Score: 16.883389041355073
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: While scaling laws promise significant performance gains for recommender systems, efficiently deploying hyperscale models remains a major unsolved challenge. In contrast to fields where FMs are already widely adopted such as natural language processing and computer vision, progress in recommender systems is hindered by unique challenges including the need to learn from online streaming data under shifting data distributions, the need to adapt to different recommendation surfaces with a wide diversity in their downstream tasks and their input distributions, and stringent latency and computational constraints. To bridge this gap, we propose to leverage the Foundation-Expert Paradigm: a framework designed for the development and deployment of hyperscale recommendation FMs. In our approach, a central FM is trained on lifelong, cross-surface, multi-modal user data to learn generalizable knowledge. This knowledge is then efficiently transferred to various lightweight, surface-specific "expert" models via target-aware embeddings, allowing them to adapt to local data distributions and optimization goals with minimal overhead. To meet our training, inference and development needs, we built HyperCast, a production-grade infrastructure system that re-engineers training, serving, logging and iteration to power this decoupled paradigm. Our approach is now deployed at Meta serving tens of billions of user requests daily, demonstrating online metric improvements over our previous one-stage production system while improving developer velocity and maintaining infrastructure efficiency. To the best of our knowledge, this work represents the first successful deployment of a Foundation-Expert paradigm at this scale, offering a proven, compute-efficient, and developer-friendly blueprint to realize the promise of scaling laws in recommender systems.

Related papers

FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models [16.83959862897466]
Federated Learning (FL) is an established paradigm for training deep learning models on decentralized data.<n>We introduce FedPromo, a novel framework that enables efficient adaptation of large-scale foundation models stored on a central server to new domains encountered only by remote clients.
arXiv Detail & Related papers (2025-08-05T12:00:49Z)
MTGR: Industrial-Scale Generative Recommendation Framework in Meituan [28.92150571719811]
We propose MTGR (Meituan Generative Recommendation) to address this issue.<n> MTGR achieves training and inference acceleration through user-level compression to ensure efficient scaling.<n>This breakthrough was successfully deployed on Meituan, the world's largest food delivery platform.
arXiv Detail & Related papers (2025-05-24T11:47:28Z)
External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation [58.49335224405165]
Ads recommendation is a prominent service of online advertising systems and has been actively studied.<n>Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement.<n>However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications.
arXiv Detail & Related papers (2025-02-20T22:35:52Z)
DRL-based Dolph-Tschebyscheff Beamforming in Downlink Transmission for Mobile Users [52.9870460238443]
We propose a deep reinforcement learning-based blind beamforming technique using a learnable Dolph-Tschebyscheff antenna array.<n>Our simulation results show that the proposed method can support data rates very close to the best possible values.
arXiv Detail & Related papers (2025-02-03T11:50:43Z)
Client-Centric Federated Adaptive Optimization [78.30827455292827]
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private.<n>We propose Federated-Centric Adaptive Optimization, which is a class of novel federated optimization approaches.
arXiv Detail & Related papers (2025-01-17T04:00:50Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large language models (LLMs) have driven profound transformations in wireless networks.<n>Within wireless environments, the training of LLMs faces significant challenges related to security and privacy.<n>This paper presents a systematic analysis of the training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning.
arXiv Detail & Related papers (2024-04-20T02:30:21Z)
Training Heterogeneous Client Models using Knowledge Distillation in Serverless Federated Learning [0.5510212613486574]
Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders.
arXiv Detail & Related papers (2024-02-11T20:15:52Z)
ON-DEMAND-FL: A Dynamic and Efficient Multi-Criteria Federated Learning Client Deployment Scheme [37.099990745974196]
We introduce an On-Demand-FL, a client deployment approach for federated learning. We make use of containerization technology such as Docker to build efficient environments. The Genetic algorithm (GA) is used to solve the multi-objective optimization problem.
arXiv Detail & Related papers (2022-11-05T13:41:19Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.