Nesting Forward Automatic Differentiation for Memory-Efficient Deep
Neural Network Training
- URL: http://arxiv.org/abs/2209.10778v1
- Date: Thu, 22 Sep 2022 04:48:48 GMT
- Title: Nesting Forward Automatic Differentiation for Memory-Efficient Deep
Neural Network Training
- Authors: Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu
Zhang, Yunxin Liu, Fan Yang, Minyi Guo
- Abstract summary: We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training.
Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
- Score: 23.536294640280087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An activation function is an element-wise mathematical function and plays a
crucial role in deep neural networks (DNN). Many novel and sophisticated
activation functions have been proposed to improve the DNN accuracy but also
consume massive memory in the training process with back-propagation. In this
study, we propose the nested forward automatic differentiation (Forward-AD),
specifically for the element-wise activation function for memory-efficient DNN
training. We deploy nested Forward-AD in two widely-used deep learning
frameworks, TensorFlow and PyTorch, which support the static and dynamic
computation graph, respectively. Our evaluation shows that nested Forward-AD
reduces the memory footprint by up to 1.97x than the baseline model and
outperforms the recomputation by 20% under the same memory reduction ratio.
Related papers
- Inverted Activations: Reducing Memory Footprint in Neural Network Training [5.070981175240306]
A significant challenge in neural network training is the memory footprint associated with activation tensors.
We propose a modification to the handling of activation tensors in pointwise nonlinearity layers.
We show that our method significantly reduces memory usage without affecting training accuracy or computational performance.
arXiv Detail & Related papers (2024-07-22T11:11:17Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - Function Regression using Spiking DeepONet [2.935661780430872]
We present an SNN-based method to perform regression, which has been a challenge due to the inherent difficulty in representing a function's input domain and continuous output values as spikes.
We use a DeepONet - neural network designed to learn operators - to learn the behavior of spikes.
We propose several methods to use a DeepONet in the spiking framework, and present accuracy and training time for different benchmarks.
arXiv Detail & Related papers (2022-05-17T15:22:22Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Data-Driven Learning of Feedforward Neural Networks with Different
Activation Functions [0.0]
This work contributes to the development of a new data-driven method (D-DM) of feedforward neural networks (FNNs) learning.
arXiv Detail & Related papers (2021-07-04T18:20:27Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.