Profiling and Improving the PyTorch Dataloader for high-latency Storage:
A Technical Report
- URL: http://arxiv.org/abs/2211.04908v1
- Date: Wed, 9 Nov 2022 14:16:30 GMT
- Title: Profiling and Improving the PyTorch Dataloader for high-latency Storage:
A Technical Report
- Authors: Ivan Svogor, Christian Eichenberger, Markus Spanring, Moritz Neun,
Michael Kopp
- Abstract summary: This work focuses on the data loading pipeline in the PyTorch Framework.
We show that for classification tasks that involve loading many files, like images, the training wall-time can be significantly improved.
With our new, modified ConcurrentDataloader we can reach improvements in GPU utilization and significantly reduce batch loading time, up to 12X.
- Score: 0.7349727826230862
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A growing number of Machine Learning Frameworks recently made Deep Learning
accessible to a wider audience of engineers, scientists, and practitioners, by
allowing straightforward use of complex neural network architectures and
algorithms. However, since deep learning is rapidly evolving, not only through
theoretical advancements but also with respect to hardware and software
engineering, ML frameworks often lose backward compatibility and introduce
technical debt that can lead to bottlenecks and sub-optimal resource
utilization. Moreover, the focus is in most cases not on deep learning
engineering, but rather on new models and theoretical advancements. In this
work, however, we focus on engineering, more specifically on the data loading
pipeline in the PyTorch Framework. We designed a series of benchmarks that
outline performance issues of certain steps in the data loading process. Our
findings show that for classification tasks that involve loading many files,
like images, the training wall-time can be significantly improved. With our
new, modified ConcurrentDataloader we can reach improvements in GPU utilization
and significantly reduce batch loading time, up to 12X. This allows for the use
of the cloud-based, S3-like object storage for datasets, and have comparable
training time as if datasets are stored on local drives.
Related papers
- Bullion: A Column Store for Machine Learning [4.096087402737292]
This paper presents Bullion, a columnar storage system tailored for machine learning workloads.
Bundy addresses the complexities of data compliance, optimize the encoding of long sequence sparse features, efficiently manages wide-table projections, introduces feature quantization in storage, and provides a comprehensive cascading encoding framework.
Preliminary experimental results and theoretical analysis demonstrate Bullion's improved ability to deliver strong performance in the face of the unique demands of machine learning workloads.
arXiv Detail & Related papers (2024-04-13T05:01:54Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Continual Learning with Transformers for Image Classification [12.028617058465333]
In computer vision, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past.
We develop a solution called Adaptive Distillation of Adapters (ADA), which is developed to perform continual learning.
We empirically demonstrate on different classification tasks that this method maintains a good predictive performance without retraining the model.
arXiv Detail & Related papers (2022-06-28T15:30:10Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning
Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy.
We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines.
We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Continuum: Simple Management of Complex Continual Learning Scenarios [1.52292571922932]
Continual learning is a machine learning sub-field specialized in settings with non-iid data.
Continual learning's challenge is to create algorithms able to learn an ever-growing amount of knowledge while dealing with data distribution drifts.
Small errors in data loaders have a critical impact on algorithm results.
arXiv Detail & Related papers (2021-02-11T20:29:13Z) - Importance of Data Loading Pipeline in Training Deep Neural Networks [2.127049691404299]
In large models, the time spent loading data takes a significant portion of model training time.
We compare binary data format to accelerate data reading, and NVIDIA DALI to accelerate data augmentation.
Our study shows improvement on the order of 20% to 40% if such dedicated tools are used.
arXiv Detail & Related papers (2020-04-21T14:19:48Z) - How to 0wn NAS in Your Spare Time [11.997555708723523]
We design an algorithm that reconstructs the key components of a novel deep learning system by exploiting a small amount of information leakage from a cache side-channel attack.
We demonstrate experimentally that we can reconstruct MalConv, a novel data pre-processing pipeline for malware detection, and ProxylessNAS CPU-NAS, a novel network architecture for ImageNet classification.
arXiv Detail & Related papers (2020-02-17T05:40:55Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.