Data Efficiency for Large Recommendation Models
- URL: http://arxiv.org/abs/2410.18111v2
- Date: Fri, 25 Oct 2024 10:26:35 GMT
- Title: Data Efficiency for Large Recommendation Models
- Authors: Kshitij Jain, Jingru Xie, Kevin Regan, Cheng Chen, Jie Han, Steve Li, Zhuoshu Li, Todd Phillips, Myles Sussman, Matt Troup, Angel Yu, Jia Zhuo,
- Abstract summary: Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry.
The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated.
This paper presents actionable principles and high-level frameworks to guide practitioners in optimizing training data requirements.
- Score: 4.799343040337817
- License:
- Abstract: Large recommendation models (LRMs) are fundamental to the multi-billion dollar online advertising industry, processing massive datasets of hundreds of billions of examples before transitioning to continuous online training to adapt to rapidly changing user behavior. The massive scale of data directly impacts both computational costs and the speed at which new methods can be evaluated (R&D velocity). This paper presents actionable principles and high-level frameworks to guide practitioners in optimizing training data requirements. These strategies have been successfully deployed in Google's largest Ads CTR prediction models and are broadly applicable beyond LRMs. We outline the concept of data convergence, describe methods to accelerate this convergence, and finally, detail how to optimally balance training data volume with model size.
Related papers
- Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory [29.646749372031593]
Large Vision Models (LVMs) as backbone and downstream perception head to understand AD semantic information.
Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence.
experiments demonstrate that the proposed method improves the performance by over 66.48% and converges faster over 6 times.
arXiv Detail & Related papers (2025-01-03T09:10:56Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.
We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws [59.03420759554073]
We introduce Adaptive Data Optimization (ADO), an algorithm that optimize data distributions in an online fashion, concurrent with model training.
ADO does not require external knowledge, proxy models, or modifications to the model update.
ADO uses per-domain scaling laws to estimate the learning potential of each domain during training and adjusts the data mixture accordingly.
arXiv Detail & Related papers (2024-10-15T17:47:44Z) - AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs [61.13296177652599]
This paper demonstrates that the optimal composition of training data from different domains is scale-dependent.
We introduce *AutoScale*, a novel, practical approach for optimizing data compositions at potentially large training data scales.
Our evaluation on GPT-2 Large and BERT pre-training demonstrates *AutoScale*'s effectiveness in improving training convergence and downstream performance.
arXiv Detail & Related papers (2024-07-29T17:06:30Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Data Augmentation Strategies for Improving Sequential Recommender
Systems [7.986899327513767]
Sequential recommender systems have recently achieved significant performance improvements with the exploitation of deep learning (DL) based methods.
We propose a set of data augmentation strategies, all of which transform original item sequences in the way of direct corruption.
Experiments on the latest DL-based model show that applying data augmentation can help the model generalize better.
arXiv Detail & Related papers (2022-03-26T09:58:14Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.