Related papers: Analyzing and Mitigating Data Stalls in DNN Training

Analyzing and Mitigating Data Stalls in DNN Training

URL: http://arxiv.org/abs/2007.06775v3
Date: Tue, 19 Jan 2021 18:35:27 GMT
Title: Analyzing and Mitigating Data Stalls in DNN Training
Authors: Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, Vijay Chidambaram
Abstract summary: We present the first comprehensive analysis of how the input data pipeline affects the training time of Deep Neural Networks (DNNs) We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed. We implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls.
Score: 7.444113272493349
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline, i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehensive analysis of how the input data pipeline affects the training time of widely-used computer vision and audio Deep Neural Networks (DNNs), that typically involve complex data preprocessing. We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft. We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed. (2) We build a tool, DS-Analyzer to precisely measure data stalls using a differential technique, and perform predictive what-if analysis on data stalls. (3) Finally, based on the insights from our analysis, we design and implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls. Our experiments on a range of DNN tasks, models, datasets, and hardware configs show that when PyTorch uses CoorDL instead of the state-of-the-art DALI data loading library, DNN training time is reduced significantly (by as much as 5x on a single server).

Related papers

DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
S3TC: Spiking Separated Spatial and Temporal Convolutions with Unsupervised STDP-based Learning for Action Recognition [1.2123876307427106]
Spiking Neural Networks (SNNs) have significantly lower computational costs (thousands of times) than regular non-spiking networks when implemented on neuromorphic hardware. We introduce, for the first time, Spiking Separated Spatial and Temporal Convolutions (S3TCs) for the sake of reducing the number of parameters required for video analysis.
arXiv Detail & Related papers (2023-09-22T10:05:35Z)
AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration [69.21282992341007]
Auto Synth automatically generates 3D training data for point cloud registration. We replace the point cloud registration network with a much smaller surrogate network, leading to a $4056.43$ speedup. Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.
arXiv Detail & Related papers (2023-09-20T09:29:44Z)
Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks [8.977436072381973]
We run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. We identify the potential causes, exercise a variety of optimization methods, and present their pros and cons.
arXiv Detail & Related papers (2023-04-18T11:57:38Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
Scalable Neural Data Server: A Data Recommender for Transfer Learning [70.06289658553675]
Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance. Nerve Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem. NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task. SNDS represents both data sources and downstream tasks by their proximity to the intermediary datasets.
arXiv Detail & Related papers (2022-06-19T12:07:32Z)
Learning-Based Data Storage [Vision] (Technical Report) [9.882820980833698]
We envision a new paradigm of data storage, "DNN-as-a-Database", where data are encoded in well-trained machine learning models. In this paper, we propose this novel concept of learning-based data storage, which utilizes a learning structure called learning-based memory unit (LMU) Our preliminary experimental results show the feasibility of the learning-based data storage by achieving high (100%) accuracy of the DNN storage.
arXiv Detail & Related papers (2022-06-12T16:14:16Z)
Deep Learning on a Data Diet: Finding Important Examples Early in Training [35.746302913918484]
In vision datasets, simple scores can be used to identify important examples very early in training. We propose two such scores -- the Gradient Normed (GraNd) and the Error L2-Norm (EL2N)
arXiv Detail & Related papers (2021-07-15T02:12:20Z)
Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client. We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.