Request-Only Optimization for Recommendation Systems
- URL: http://arxiv.org/abs/2508.05640v3
- Date: Thu, 14 Aug 2025 23:38:08 GMT
- Title: Request-Only Optimization for Recommendation Systems
- Authors: Liang Guo, Wei Li, Lucy Liao, Huihui Cheng, Rui Zhang, Yu Shi, Yueming Wang, Yanzun Huang, Keke Zhai, Pengchao Wang, Timothy Shi, Xuan Cao, Shengzhi Wang, Renqin Cai, Zhaojie Gong, Omkar Vichare, Rui Jian, Leon Gao, Shiyan Deng, Xingyu Liu, Xiong Zhang, Fu Li, Wenlei Xie, Bin Wen, Rui Li, Lu Fang, Xing Liu, Jiaqi Zhai,
- Abstract summary: Industry-scale Deep Learning Recommendation Models (DLRMs) serve billions of users every day.<n>DLRMs have been scaled up to unprecedented complexity, up to trillions of floating-point operations (TFLOPs) per example.<n>This scale, coupled with the huge amount of training data, necessitates new storage and training algorithms to efficiently improve the quality of these complex recommendation systems.
- Score: 39.94415867218579
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Learning Recommendation Models (DLRMs) represent one of the largest machine learning applications on the planet. Industry-scale DLRMs are trained with petabytes of recommendation data to serve billions of users every day. To utilize the rich user signals in the long user history, DLRMs have been scaled up to unprecedented complexity, up to trillions of floating-point operations (TFLOPs) per example. This scale, coupled with the huge amount of training data, necessitates new storage and training algorithms to efficiently improve the quality of these complex recommendation systems. In this paper, we present a Request-Only Optimizations (ROO) training and modeling paradigm. ROO simultaneously improves the storage and training efficiency as well as the model quality of recommendation systems. We holistically approach this challenge through co-designing data (i.e., request-only data), infrastructure (i.e., request-only based data processing pipeline), and model architecture (i.e., request-only neural architectures). Our ROO training and modeling paradigm treats a user request as a unit of the training data. Compared with the established practice of treating a user impression as a unit, our new design achieves native feature deduplication in data logging, consequently saving data storage. Second, by de-duplicating computations and communications across multiple impressions in a request, this new paradigm enables highly scaled-up neural network architectures to better capture user interest signals, such as Generative Recommenders (GRs) and other request-only friendly architectures.
Related papers
- Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels [96.35283762778137]
We introduce the Webscale-RL pipeline, a scalable data engine for reinforcement learning.<n>We construct the Webscale-RL dataset, containing 1.2 million examples across more than 9 domains.<n>Our work presents a viable path toward scaling RL to pre-training levels, enabling more capable and efficient language models.
arXiv Detail & Related papers (2025-10-07T22:30:59Z) - Towards a Real-World Aligned Benchmark for Unlearning in Recommender Systems [49.766845975588275]
We propose a set of design desiderata and research questions to guide the development of a more realistic benchmark for unlearning in recommender systems.<n>We argue for an unlearning setup that reflects the sequential, time-sensitive nature of real-world deletion requests.<n>We present a preliminary experiment in a next-basket recommendation setting based on our proposed desiderata and find that unlearning also works for sequential recommendation models.
arXiv Detail & Related papers (2025-08-23T16:05:40Z) - Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation [20.851925464903804]
This paper introduces a novel learning paradigm, Dynamic Sparse Learning, tailored for recommendation models.
DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance.
Our experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.
arXiv Detail & Related papers (2024-02-05T10:16:20Z) - DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems [79.76519917171261]
This paper addresses the computational overhead and resource inefficiency prevalent in Sequential Recommender Systems (SRSs)<n>We introduce an innovative approach combining pruning methods with advanced model designs.<n>Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec)
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Serving Deep Learning Model in Relational Databases [70.53282490832189]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains.
We highlight three pivotal paradigms: The state-of-the-art DL-centric architecture offloads DL computations to dedicated DL frameworks.
The potential UDF-centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS)
arXiv Detail & Related papers (2023-10-07T06:01:35Z) - COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training [42.514897110537596]
Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train.
designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging task.
We introduce COMET, a holistic cluster design methodology and workflow to jointly study the impact of parallelization strategies and key cluster resource provisioning on the performance of distributed DL training.
arXiv Detail & Related papers (2022-11-30T00:32:37Z) - RecD: Deduplication for End-to-End Deep Learning Recommendation Model
Training Infrastructure [3.991664287163157]
RecD addresses immense storage, preprocessing, and training overheads caused by feature duplication inherent in industry-scale DLRM training datasets.
We show how RecD improves the training and preprocessing throughput and storage efficiency by up to 2.48x, 1.79x, and 3.71x, respectively, in an industry-scale DLRM training system.
arXiv Detail & Related papers (2022-11-09T22:21:19Z) - FedNet2Net: Saving Communication and Computations in Federated Learning
with Model Growing [0.0]
Federated learning (FL) is a recently developed area of machine learning.
In this paper, a novel scheme based on the notion of "model growing" is proposed.
The proposed approach is tested extensively on three standard benchmarks and is shown to achieve substantial reduction in communication and client computation.
arXiv Detail & Related papers (2022-07-19T21:54:53Z) - Incremental Learning for Personalized Recommender Systems [8.020546404087922]
We present an incremental learning solution to provide both the training efficiency and the model quality.
The solution is deployed in LinkedIn and directly applicable to industrial scale recommender systems.
arXiv Detail & Related papers (2021-08-13T04:21:21Z) - Real-time Federated Evolutionary Neural Architecture Search [14.099753950531456]
Federated learning is a distributed machine learning approach to privacy preservation.
We propose an evolutionary approach to real-time federated neural architecture search that not only optimize the model performance but also reduces the local payload.
This way, we effectively reduce computational and communication costs required for evolutionary optimization and avoid big performance fluctuations of the local models.
arXiv Detail & Related papers (2020-03-04T17:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.