Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
- URL: http://arxiv.org/abs/2402.17152v3
- Date: Mon, 6 May 2024 02:05:45 GMT
- Title: Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
- Authors: Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi,
- Abstract summary: Large-scale recommendation systems need to handle tens of billions of user actions on a daily basis.
Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute.
Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems.
- Score: 11.198481792194452
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework ("Generative Recommenders"), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65.8% in NDCG, and is 5.3x to 15.2x faster than FlashAttention2-based Transformers on 8192 length sequences. HSTU-based Generative Recommenders, with 1.5 trillion parameters, improve metrics in online A/B tests by 12.4% and have been deployed on multiple surfaces of a large internet platform with billions of users. More importantly, the model quality of Generative Recommenders empirically scales as a power-law of training compute across three orders of magnitude, up to GPT-3/LLaMa-2 scale, which reduces carbon footprint needed for future model developments, and further paves the way for the first foundational models in recommendations.
Related papers
- A Novel Mamba-based Sequential Recommendation Method [4.941272356564765]
Sequential recommendation (SR) encodes user activity to predict the next action.
Transformer-based models have proven effective for sequential recommendation, but the complexity of the self-attention module in Transformers scales quadratically with the sequence length.
We propose a novel multi-head latent Mamba architecture, which employs multiple low-dimensional Mamba layers and fully connected layers.
arXiv Detail & Related papers (2025-04-10T02:43:19Z) - Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations.
Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression.
We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z) - An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law [2.688944054336062]
Climber is a resource-efficient recommendation framework.
It has been successfully deployed on Netease Cloud Music, one of China's largest music streaming platforms.
arXiv Detail & Related papers (2025-02-14T03:25:09Z) - Scaling Sequential Recommendation Models with Transformers [0.0]
We take inspiration from the scaling laws observed in training large language models, and explore similar principles for sequential recommendation.
Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application.
We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains.
arXiv Detail & Related papers (2024-12-10T15:20:56Z) - Scaling New Frontiers: Insights into Large Recommendation Models [74.77410470984168]
Meta's generative recommendation model HSTU illustrates the scaling laws of recommendation systems by expanding parameters to thousands of billions.
We conduct comprehensive ablation studies to explore the origins of these scaling laws.
We offer insights into future directions for large recommendation models.
arXiv Detail & Related papers (2024-12-01T07:27:20Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.
We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Leveraging Large Language Models to Enhance Personalized Recommendations in E-commerce [6.660249346977347]
This study explores the application of large language model (LLM) in personalized recommendation system of e-commerce.
LLM effectively captures the implicit needs of users through deep semantic understanding of user comments and product description data.
The study shows that LLM has significant advantages in the field of personalized recommendation, can improve user experience and promote platform sales growth.
arXiv Detail & Related papers (2024-10-02T13:59:56Z) - Mixture of Experts with Mixture of Precisions for Tuning Quality of Service [0.0]
This paper presents an adaptive serving approach for the efficient deployment of MoE models.
By dynamically determining the number of quantized experts, we offer a fine-grained range of configurations for tuning throughput and model quality.
Results highlight the practical applicability of our approach in dynamic and accuracy-sensitive applications.
arXiv Detail & Related papers (2024-07-19T15:42:49Z) - PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender System [42.79538136366075]
This paper proposes a parameter transmission-free federated sequential recommendation framework (PTF-FSR)
PTF-FSR ensures both model and data privacy protection to meet the privacy needs of service providers and system users alike.
arXiv Detail & Related papers (2024-06-08T07:45:46Z) - SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation [74.07836010698801]
We propose an SMPL-based Transformer framework (SMPLer) to address this issue.
SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation.
Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods.
arXiv Detail & Related papers (2024-04-23T17:59:59Z) - LightLM: A Lightweight Deep and Narrow Language Model for Generative
Recommendation [45.00339682494516]
LightLM is a lightweight Transformer-based language model for generative recommendation.
LightLM tackles the issue by introducing a light-weight deep and narrow Transformer architecture.
We also show that our devised user and item ID indexing methods, i.e., Spectral Collaborative Indexing (SCI) and Graph Collaborative Indexing (GCI), enables the deep and narrow Transformer architecture to outperform large-scale language models for recommendation.
arXiv Detail & Related papers (2023-10-26T15:44:57Z) - SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation [83.18930314027254]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X) with up to ViT-Huge as the backbone.
With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments.
arXiv Detail & Related papers (2023-09-29T17:58:06Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
Structured Pruning for Vision Transformer [76.2625311630021]
Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks.
To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency.
We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
arXiv Detail & Related papers (2023-01-13T00:40:24Z) - On the Generalizability and Predictability of Recommender Systems [33.46314108814183]
We give the first large-scale study of recommender system approaches.
We create Reczilla, a meta-learning approach to recommender systems.
arXiv Detail & Related papers (2022-06-23T17:51:42Z) - DeepNet: Scaling Transformers to 1,000 Layers [106.33669415337135]
We introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer.
In-depth theoretical analysis shows that model updates can be bounded in a stable way.
We successfully scale Transformers up to 1,000 layers without difficulty, which is one order of magnitude deeper than previous deep Transformers.
arXiv Detail & Related papers (2022-03-01T15:36:38Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.