InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep
Recommendation Models
- URL: http://arxiv.org/abs/2308.08500v1
- Date: Sun, 13 Aug 2023 18:28:56 GMT
- Title: InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep
Recommendation Models
- Authors: Kabir Nagrecha, Lingyi Liu, Pablo Delgado, Prasanna Padmanabhan
- Abstract summary: Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems.
The systems challenges faced in this setting are unique; while typical deep learning training jobs are dominated by model execution, the most important factor in DLRM training performance is often online data ingestion.
- Score: 3.7414278978078204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning-based recommender models (DLRMs) have become an essential
component of many modern recommender systems. Several companies are now
building large compute clusters reserved only for DLRM training, driving new
interest in cost- and time- saving optimizations. The systems challenges faced
in this setting are unique; while typical deep learning training jobs are
dominated by model execution, the most important factor in DLRM training
performance is often online data ingestion.
In this paper, we explore the unique characteristics of this data ingestion
problem and provide insights into DLRM training pipeline bottlenecks and
challenges. We study real-world DLRM data processing pipelines taken from our
compute cluster at Netflix to observe the performance impacts of online
ingestion and to identify shortfalls in existing pipeline optimizers. We find
that current tooling either yields sub-optimal performance, frequent crashes,
or else requires impractical cluster re-organization to adopt. Our studies lead
us to design and build a new solution for data pipeline optimization, InTune.
InTune employs a reinforcement learning (RL) agent to learn how to distribute
the CPU resources of a trainer machine across a DLRM data pipeline to more
effectively parallelize data loading and improve throughput. Our experiments
show that InTune can build an optimized data pipeline configuration within only
a few minutes, and can easily be integrated into existing training workflows.
By exploiting the responsiveness and adaptability of RL, InTune achieves higher
online data ingestion rates than existing optimizers, thus reducing idle times
in model execution and increasing efficiency. We apply InTune to our real-world
cluster, and find that it increases data ingestion throughput by as much as
2.29X versus state-of-the-art data pipeline optimizers while also improving
both CPU & GPU utilization.
Related papers
- OPTune: Efficient Online Preference Tuning [107.44836901099]
We propose a more efficient data exploration strategy for online preference tuning (OPTune)
OPTune dynamically samples informative responses for on-policy preference alignment.
In our evaluations, OPTune'd LLMs enjoy 1.27-1.56x faster training speed due to the efficient data exploration strategy.
arXiv Detail & Related papers (2024-06-11T18:55:04Z) - Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework
for Enhancing Model Performance and Efficiency [9.460023981858319]
We propose an end-to-end Adaptive DAtaset PRUNing framework called AdaPruner.
AdaPruner iteratively prunes redundant samples to an expected pruning ratio.
It can still significantly enhance model performance even after pruning up to 10-30% of the training data.
arXiv Detail & Related papers (2023-12-09T16:01:21Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of
Language Models [40.54353850357839]
We show how we can employ submodular optimization to select highly representative subsets of the training corpora.
We show that the resulting models achieve up to $sim99%$ of the performance of the fully-trained models.
arXiv Detail & Related papers (2023-05-11T09:24:41Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - RecD: Deduplication for End-to-End Deep Learning Recommendation Model
Training Infrastructure [3.991664287163157]
RecD addresses immense storage, preprocessing, and training overheads caused by feature duplication inherent in industry-scale DLRM training datasets.
We show how RecD improves the training and preprocessing throughput and storage efficiency by up to 2.48x, 1.79x, and 3.71x, respectively, in an industry-scale DLRM training system.
arXiv Detail & Related papers (2022-11-09T22:21:19Z) - dPRO: A Generic Profiling and Optimization System for Expediting
Distributed DNN Training [12.413533491501548]
This paper proposes dPRO, a tool to identify performance bottlenecks in distributed training systems.
We implement dPRO on multiple deep learning frameworks (PyTorch, MXNet, AllReduce and Server architecture) and representative communication schemes.
Extensive experiments show that dPRO predicts performance of distributed training in various settings with5% errors in most cases and finds optimization strategies with up to87.1%-up over the baselines.
arXiv Detail & Related papers (2022-05-05T07:15:25Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy.
We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines.
We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.