Related papers: Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

URL: http://arxiv.org/abs/2209.10778v1
Date: Thu, 22 Sep 2022 04:48:48 GMT
Title: Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
Authors: Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo
Abstract summary: We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training. Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
Score: 23.536294640280087
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been proposed to improve the DNN accuracy but also consume massive memory in the training process with back-propagation. In this study, we propose the nested forward automatic differentiation (Forward-AD), specifically for the element-wise activation function for memory-efficient DNN training. We deploy nested Forward-AD in two widely-used deep learning frameworks, TensorFlow and PyTorch, which support the static and dynamic computation graph, respectively. Our evaluation shows that nested Forward-AD reduces the memory footprint by up to 1.97x than the baseline model and outperforms the recomputation by 20% under the same memory reduction ratio.

Related papers

Inverted Activations: Reducing Memory Footprint in Neural Network Training [5.070981175240306]
A significant challenge in neural network training is the memory footprint associated with activation tensors. We propose a modification to the handling of activation tensors in pointwise nonlinearity layers. We show that our method significantly reduces memory usage without affecting training accuracy or computational performance.
arXiv Detail & Related papers (2024-07-22T11:11:17Z)
EMN: Brain-inspired Elastic Memory Network for Quick Domain Adaptive Feature Mapping [57.197694698750404]
We propose a novel gradient-free Elastic Memory Network to support quick fine-tuning of the mapping between features and prediction. EMN adopts randomly connected neurons to memorize the association of features and labels, where the signals in the network are propagated as impulses. EMN can achieve up to 10% enhancement of performance while only needing less than 1% timing cost of traditional domain adaptation methods.
arXiv Detail & Related papers (2024-02-04T09:58:17Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Towards Zero Memory Footprint Spiking Neural Network Training [7.4331790419913455]
Spiking Neural Networks (SNNs) process information using discrete-time events known as spikes rather than continuous values. In this paper, we introduce an innovative framework characterized by a remarkably low memory footprint. Our design is able to achieve a $mathbf58.65times$ reduction in memory usage compared to the current SNN node.
arXiv Detail & Related papers (2023-08-16T19:49:24Z)
Efficient Parametric Approximations of Neural Network Function Space Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z)
Function Regression using Spiking DeepONet [2.935661780430872]
We present an SNN-based method to perform regression, which has been a challenge due to the inherent difficulty in representing a function's input domain and continuous output values as spikes. We use a DeepONet - neural network designed to learn operators - to learn the behavior of spikes. We propose several methods to use a DeepONet in the spiking framework, and present accuracy and training time for different benchmarks.
arXiv Detail & Related papers (2022-05-17T15:22:22Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
Data-Driven Learning of Feedforward Neural Networks with Different Activation Functions [0.0]
This work contributes to the development of a new data-driven method (D-DM) of feedforward neural networks (FNNs) learning.
arXiv Detail & Related papers (2021-07-04T18:20:27Z)
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation. ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.