Survey on Large Scale Neural Network Training
- URL: http://arxiv.org/abs/2202.10435v1
- Date: Mon, 21 Feb 2022 18:48:02 GMT
- Title: Survey on Large Scale Neural Network Training
- Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel
Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov,
Ivan Oseledets and Olivier Beaumont
- Abstract summary: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.
This survey provides a systematic overview of the approaches that enable more efficient DNNs training.
- Score: 48.424512364338746
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store
weight, activations, and other intermediate tensors during training. Hence,
many models do not fit one GPU device or can be trained using only a small
per-GPU batch size. This survey provides a systematic overview of the
approaches that enable more efficient DNNs training. We analyze techniques that
save memory and make good use of computation and communication resources on
architectures with a single or several GPUs. We summarize the main categories
of strategies and compare strategies within and across categories. Along with
approaches proposed in the literature, we discuss available implementations.
Related papers
- Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models.
This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments.
This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Bandit Sampling for Multiplex Networks [8.771092194928674]
We propose an algorithm for scalable learning on multiplex networks with a large number of layers.
Online learning algorithm learns how to sample relevant neighboring layers so that only the layers with relevant information are aggregated during training.
We present experimental results on both synthetic and real-world scenarios.
arXiv Detail & Related papers (2022-02-08T03:26:34Z) - How to Train Your Neural Network: A Comparative Evaluation [1.3654846342364304]
We discuss and compare current state-of-the-art frameworks for large scale distributed deep learning.
We present empirical results comparing their performance on large image and language training tasks.
Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.
arXiv Detail & Related papers (2021-11-09T04:24:42Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Benchmarking network fabrics for data distributed training of deep
neural networks [10.067102343753643]
Large computational requirements for training deep models have necessitated the development of new methods for faster training.
One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes.
In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning.
arXiv Detail & Related papers (2020-08-18T17:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.