Related papers: Survey on Large Scale Neural Network Training

Survey on Large Scale Neural Network Training

URL: http://arxiv.org/abs/2202.10435v1
Date: Mon, 21 Feb 2022 18:48:02 GMT
Title: Survey on Large Scale Neural Network Training
Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets and Olivier Beaumont
Abstract summary: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. This survey provides a systematic overview of the approaches that enable more efficient DNNs training.
Score: 48.424512364338746
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.

Related papers

Towards Efficient Training of Graph Neural Networks: A Multiscale Approach [20.713913005905297]
Graph Neural Networks (GNNs) have emerged as a powerful tool for learning and inferring from graph-structured data. We introduce a novel framework for efficient multiscale training of GNNs, designed to integrate information across multiscale representations of a graph.
arXiv Detail & Related papers (2025-03-25T13:52:26Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models. This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments. This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training) Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Bandit Sampling for Multiplex Networks [8.771092194928674]
We propose an algorithm for scalable learning on multiplex networks with a large number of layers. Online learning algorithm learns how to sample relevant neighboring layers so that only the layers with relevant information are aggregated during training. We present experimental results on both synthetic and real-world scenarios.
arXiv Detail & Related papers (2022-02-08T03:26:34Z)
How to Train Your Neural Network: A Comparative Evaluation [1.3654846342364304]
We discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. We present empirical results comparing their performance on large image and language training tasks. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.
arXiv Detail & Related papers (2021-11-09T04:24:42Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Benchmarking network fabrics for data distributed training of deep neural networks [10.067102343753643]
Large computational requirements for training deep models have necessitated the development of new methods for faster training. One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes. In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning.
arXiv Detail & Related papers (2020-08-18T17:38:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.