Related papers: Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders

Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders

URL: http://arxiv.org/abs/2510.22049v1
Date: Fri, 24 Oct 2025 22:17:49 GMT
Title: Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
Authors: Zhimin Chen, Chenyu Zhao, Ka Chun Mo, Yunjiang Jiang, Jane H. Lee, Shouwei Chen, Khushhall Chandra Mahajan, Ning Jiang, Kai Ren, Jinhui Li, Wen-Yun Yang,
Abstract summary: We propose a novel two-stage modeling framework, namely VIrtual Sequential Target Attention (VISTA)<n>VISTA decomposes traditional target attention from a candidate item to user history items into two distinct stages.<n>Our approach achieves significant improvements in offline and online metrics and has been successfully deployed on an industry leading recommendation platform.
Score: 11.073761978382398
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Modern large-scale recommendation systems rely heavily on user interaction history sequences to enhance the model performance. The advent of large language models and sequential modeling techniques, particularly transformer-like architectures, has led to significant advancements recently (e.g., HSTU, SIM, and TWIN models). While scaling to ultra-long user histories (10k to 100k items) generally improves model performance, it also creates significant challenges on latency, queries per second (QPS) and GPU cost in industry-scale recommendation systems. Existing models do not adequately address these industrial scalability issues. In this paper, we propose a novel two-stage modeling framework, namely VIrtual Sequential Target Attention (VISTA), which decomposes traditional target attention from a candidate item to user history items into two distinct stages: (1) user history summarization into a few hundred tokens; followed by (2) candidate item attention to those tokens. These summarization token embeddings are then cached in storage system and then utilized as sequence features for downstream model training and inference. This novel design for scalability enables VISTA to scale to lifelong user histories (up to one million items) while keeping downstream training and inference costs fixed, which is essential in industry. Our approach achieves significant improvements in offline and online metrics and has been successfully deployed on an industry leading recommendation platform serving billions of users.

Related papers

Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin [21.0248704845397]
Short-video recommenders such as Douyin must exploit extremely long user histories without breaking latency or cost budgets.<n>We present an end-to-end system that scales long-length modeling to 10k histories in production histories.
arXiv Detail & Related papers (2025-11-08T17:22:54Z)
Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z)
PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform [9.628316811614566]
We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform.<n>We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications.<n>Our infrastructure and algorithmic optimizations, such as the Deduplicated Cross-Attention Transformer (DCAT), improved our throughput by 600% on Pinterest.
arXiv Detail & Related papers (2025-07-17T00:37:59Z)
FuXi-$α$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer [81.12174905444229]
Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy.<n>We propose a new model called FuXi-$alpha$ to address these issues.<n>Our model outperforms existing models, with its performance continuously improving as the model size increases.
arXiv Detail & Related papers (2025-02-05T09:46:54Z)
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data [61.62554324594797]
We propose DreamMask, which explores how to generate training data in the open-vocabulary setting, and how to train the model with both real and synthetic data.<n>In general, DreamMask significantly simplifies the collection of large-scale training data, serving as a plug-and-play enhancement for existing methods.<n>For instance, when trained on COCO and tested on ADE20K, the model equipped with DreamMask outperforms the previous state-of-the-art by a substantial margin of 2.1% mIoU.
arXiv Detail & Related papers (2025-01-03T19:00:00Z)
Scaling Sequential Recommendation Models with Transformers [0.0]
We take inspiration from the scaling laws observed in training large language models, and explore similar principles for sequential recommendation.<n> Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application.<n>We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains.
arXiv Detail & Related papers (2024-12-10T15:20:56Z)
Scaling New Frontiers: Insights into Large Recommendation Models [74.77410470984168]
Meta's generative recommendation model HSTU illustrates the scaling laws of recommendation systems by expanding parameters to thousands of billions.<n>We conduct comprehensive ablation studies to explore the origins of these scaling laws.<n>We offer insights into future directions for large recommendation models.
arXiv Detail & Related papers (2024-12-01T07:27:20Z)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z)
CAM2: Conformity-Aware Multi-Task Ranking Model for Large-Scale Recommender Systems [0.0]
We introduce CAM2, a conformity-aware multi-task ranking model to serve relevant items to users on one of the largest industrial recommendation platforms. We show through online experiments that the CAM2 model results in a significant 0.50% increase in aggregated user engagement.
arXiv Detail & Related papers (2023-04-17T19:00:55Z)
PreSizE: Predicting Size in E-Commerce using Transformers [76.33790223551074]
PreSizE is a novel deep learning framework which utilizes Transformers for accurate size prediction. We demonstrate that PreSizE is capable of achieving superior prediction performance compared to previous state-of-the-art baselines. As a proof of concept, we demonstrate that size predictions made by PreSizE can be effectively integrated into an existing production recommender system.
arXiv Detail & Related papers (2021-05-04T15:23:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.