Training for temporal sparsity in deep neural networks, application in
video processing
- URL: http://arxiv.org/abs/2107.07305v1
- Date: Thu, 15 Jul 2021 13:17:11 GMT
- Title: Training for temporal sparsity in deep neural networks, application in
video processing
- Authors: Amirreza Yousefzadeh, Manolis Sifalakis
- Abstract summary: Activation sparsity improves compute efficiency and resource utilization in sparsity-aware neural network accelerators.
We introduce a new layer (called Delta Activation Layer) to promote temporal sparsity of activations during training.
We report an almost 3x improvement of activation sparsity, with recoverable loss of model accuracy after longer training.
- Score: 0.30458514384586394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Activation sparsity improves compute efficiency and resource utilization in
sparsity-aware neural network accelerators. As the predominant operation in
DNNs is multiply-accumulate (MAC) of activations with weights to compute inner
products, skipping operations where (at least) one of the two operands is zero
can make inference more efficient in terms of latency and power. Spatial
sparsification of activations is a popular topic in DNN literature and several
methods have already been established to bias a DNN for it. On the other hand,
temporal sparsity is an inherent feature of bio-inspired spiking neural
networks (SNNs), which neuromorphic processing exploits for hardware
efficiency. Introducing and exploiting spatio-temporal sparsity, is a topic
much less explored in DNN literature, but in perfect resonance with the trend
in DNN, to shift from static signal processing to more streaming signal
processing. Towards this goal, in this paper we introduce a new DNN layer
(called Delta Activation Layer), whose sole purpose is to promote temporal
sparsity of activations during training. A Delta Activation Layer casts
temporal sparsity into spatial activation sparsity to be exploited when
performing sparse tensor multiplications in hardware. By employing delta
inference and ``the usual'' spatial sparsification heuristics during training,
the resulting model learns to exploit not only spatial but also temporal
activation sparsity (for a given input data distribution). One may use the
Delta Activation Layer either during vanilla training or during a refinement
phase. We have implemented Delta Activation Layer as an extension of the
standard Tensoflow-Keras library, and applied it to train deep neural networks
on the Human Action Recognition (UCF101) dataset. We report an almost 3x
improvement of activation sparsity, with recoverable loss of model accuracy
after longer training.
Related papers
- Exploring the Benefit of Activation Sparsity in Pre-training [117.25661020250658]
We study how activation properties change during pre-training.
We propose Switchable Sparse-Dense Learning (SSD)
SSD achieves comparable performance with identical model size and reduces pre-training costs.
arXiv Detail & Related papers (2024-10-04T13:53:33Z) - Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation [3.5624857747396814]
Spi neural networks (SNNs) have received widespread attention as an ultra-low energy computing paradigm.
Recent studies have focused on improving the feature extraction capability of SNNs, but they suffer from inefficient and suboptimal performance.
We.
propose a simple yet effective temporal reversed training (TRT) method to optimize the temporal performance of SNNs.
arXiv Detail & Related papers (2024-08-17T06:23:38Z) - Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training [20.49255973077044]
This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge.
Results show a reduction of $sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss.
We show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.
arXiv Detail & Related papers (2023-12-14T23:07:37Z) - Dynamic Sparsity Is Channel-Level Sparsity Learner [91.31071026340746]
Dynamic sparse training (DST) is a leading sparse training approach.
Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity.
Our approach translates unstructured sparsity to channel-wise sparsity.
arXiv Detail & Related papers (2023-05-30T23:33:45Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Temporal Efficient Training of Spiking Neural Network via Gradient
Re-weighting [29.685909045226847]
Brain-inspired spiking neuron networks (SNNs) have attracted widespread research interest because of their event-driven and energy-efficient characteristics.
Current direct training approach with surrogate gradient results in SNNs with poor generalizability.
We introduce the temporal efficient training (TET) approach to compensate for the loss of momentum in the gradient descent with SG.
arXiv Detail & Related papers (2022-02-24T08:02:37Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike
Hybrid Input Encoding [5.725845886457027]
Spiking Neural Networks (SNNs) provide higher computational efficiency in event driven neuromorphic hardware.
SNNs suffer from high inference latency, resulting from inefficient input encoding and training techniques.
This paper presents a training framework for low-latency energy-efficient SNNs.
arXiv Detail & Related papers (2021-07-26T06:16:40Z) - Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for
Efficient Training [18.521882534906972]
We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity.
We conduct experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling.
arXiv Detail & Related papers (2021-06-22T22:44:32Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.