Importance of Data Loading Pipeline in Training Deep Neural Networks
- URL: http://arxiv.org/abs/2005.02130v1
- Date: Tue, 21 Apr 2020 14:19:48 GMT
- Title: Importance of Data Loading Pipeline in Training Deep Neural Networks
- Authors: Mahdi Zolnouri and Xinlin Li and Vahid Partovi Nia
- Abstract summary: In large models, the time spent loading data takes a significant portion of model training time.
We compare binary data format to accelerate data reading, and NVIDIA DALI to accelerate data augmentation.
Our study shows improvement on the order of 20% to 40% if such dedicated tools are used.
- Score: 2.127049691404299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training large-scale deep neural networks is a long, time-consuming
operation, often requiring many GPUs to accelerate. In large models, the time
spent loading data takes a significant portion of model training time. As GPU
servers are typically expensive, tricks that can save training time are
valuable.Slow training is observed especially on real-world applications where
exhaustive data augmentation operations are required. Data augmentation
techniques include: padding, rotation, adding noise, down sampling, up
sampling, etc. These additional operations increase the need to build an
efficient data loading pipeline, and to explore existing tools to speed up
training time. We focus on the comparison of two main tools designed for this
task, namely binary data format to accelerate data reading, and NVIDIA DALI to
accelerate data augmentation. Our study shows improvement on the order of 20%
to 40% if such dedicated tools are used.
Related papers
- Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep
Recommendation Models [3.7414278978078204]
Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems.
The systems challenges faced in this setting are unique; while typical deep learning training jobs are dominated by model execution, the most important factor in DLRM training performance is often online data ingestion.
arXiv Detail & Related papers (2023-08-13T18:28:56Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - CiT: Curation in Training for Effective Vision-Language Data [84.77867625605053]
This paper presents Curation in Training (CiT), a vision-text learning algorithm that couples a data objective into training.
CiT automatically yields quality data to speed-up contrastive image-text training.
We observe that CiT can speed up training by over an order of magnitude, especially if the raw data size is large.
arXiv Detail & Related papers (2023-01-05T18:59:57Z) - Profiling and Improving the PyTorch Dataloader for high-latency Storage:
A Technical Report [0.7349727826230862]
This work focuses on the data loading pipeline in the PyTorch Framework.
We show that for classification tasks that involve loading many files, like images, the training wall-time can be significantly improved.
With our new, modified ConcurrentDataloader we can reach improvements in GPU utilization and significantly reduce batch loading time, up to 12X.
arXiv Detail & Related papers (2022-11-09T14:16:30Z) - Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy.
We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines.
We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - Understanding and Co-designing the Data Ingestion Pipeline for
Industry-Scale RecSys Training [5.058493679956239]
We present an extensive characterization of the data ingestion challenges for industry-scale recommendation model training.
First, dataset storage requirements are massive and variable; exceeding local storage capacities.
Secondly, reading and preprocessing data is computationally expensive, requiring substantially more compute, memory, and network resources than are available on trainers themselves.
We introduce Data PreProcessing Service (DPP), a fully disaggregated preprocessing service that scales to hundreds of nodes, eliminating data stalls that can reduce training throughput by 56%.
arXiv Detail & Related papers (2021-08-20T21:09:34Z) - Multi-node Bert-pretraining: Cost-efficient Approach [6.5998084177955425]
Large scale Transformer-based language models have brought about exciting leaps in state-of-the-art results for many Natural Language Processing (NLP) tasks.
With the advent of large-scale unsupervised datasets, training time is further extended due to the increased amount of data samples within a single training epoch.
We show that we are able to perform pre-training on BERT within a reasonable time budget (12 days) in an academic setting.
arXiv Detail & Related papers (2020-08-01T05:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.