MTGR: Industrial-Scale Generative Recommendation Framework in Meituan
- URL: http://arxiv.org/abs/2505.18654v3
- Date: Fri, 20 Jun 2025 05:35:15 GMT
- Title: MTGR: Industrial-Scale Generative Recommendation Framework in Meituan
- Authors: Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, Menglei Zhou, Lei Yu, Chuan Liu, Wei Lin,
- Abstract summary: We propose MTGR (Meituan Generative Recommendation) to address this issue.<n> MTGR achieves training and inference acceleration through user-level compression to ensure efficient scaling.<n>This breakthrough was successfully deployed on Meituan, the world's largest food delivery platform.
- Score: 28.92150571719811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scaling law has been extensively validated in many domains such as natural language processing and computer vision. In the recommendation system, recent work has adopted generative recommendations to achieve scalability, but their generative approaches require abandoning the carefully constructed cross features of traditional recommendation models. We found that this approach significantly degrades model performance, and scaling up cannot compensate for it at all. In this paper, we propose MTGR (Meituan Generative Recommendation) to address this issue. MTGR is modeling based on the HSTU architecture and can retain the original deep learning recommendation model (DLRM) features, including cross features. Additionally, MTGR achieves training and inference acceleration through user-level compression to ensure efficient scaling. We also propose Group-Layer Normalization (GLN) to enhance the performance of encoding within different semantic spaces and the dynamic masking strategy to avoid information leakage. We further optimize the training frameworks, enabling support for our models with 10 to 100 times computational complexity compared to the DLRM, without significant cost increases. MTGR achieved 65x FLOPs for single-sample forward inference compared to the DLRM model, resulting in the largest gain in nearly two years both offline and online. This breakthrough was successfully deployed on Meituan, the world's largest food delivery platform, where it has been handling the main traffic.
Related papers
- Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment [16.883389041355073]
We propose a framework designed for the development and deployment of hyperscale recommendation FMs.<n>In our approach, a central FM is trained on lifelong, cross-surface, multi-modal user data to learn generalizable knowledge.<n>This knowledge is then efficiently transferred to various lightweight, surface-specific "expert" models via target-aware embeddings.
arXiv Detail & Related papers (2025-08-04T22:03:13Z) - PRISM: Distributed Inference for Foundation Models at Edge [73.54372283220444]
PRISM is a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices.<n>We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets.
arXiv Detail & Related papers (2025-07-16T11:25:03Z) - LatentLLM: Attention-Aware Joint Tensor Compression [50.33925662486034]
Large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.<n>We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure.
arXiv Detail & Related papers (2025-05-23T22:39:54Z) - Action is All You Need: Dual-Flow Generative Ranking Network for Recommendation [25.30922374657862]
We introduce a Dual-Flow Generative Ranking Network (DFGR) for recommendation scenarios.<n> DFGR employs a dual-flow mechanism to optimize interaction modeling.<n>Experiments in open-source and real industrial datasets show that DFGR outperforms DLRM.
arXiv Detail & Related papers (2025-05-22T14:58:53Z) - Inference-Time Scaling for Generalist Reward Modeling [25.62000059973935]
Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale.<n>Key challenge of RL is to obtain accurate reward signals for LLMs in various domains beyond verifiable questions or artificial rules.<n>In this work, we investigate how to improve reward modeling with more inference compute for general queries.
arXiv Detail & Related papers (2025-04-03T11:19:49Z) - An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law [2.688944054336062]
Climber is a resource-efficient recommendation framework.<n>It has been successfully deployed on Netease Cloud Music, one of China's largest music streaming platforms.
arXiv Detail & Related papers (2025-02-14T03:25:09Z) - Scaling New Frontiers: Insights into Large Recommendation Models [74.77410470984168]
Meta's generative recommendation model HSTU illustrates the scaling laws of recommendation systems by expanding parameters to thousands of billions.<n>We conduct comprehensive ablation studies to explore the origins of these scaling laws.<n>We offer insights into future directions for large recommendation models.
arXiv Detail & Related papers (2024-12-01T07:27:20Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.