Related papers: Training for temporal sparsity in deep neural networks, application in video processing

Training for temporal sparsity in deep neural networks, application in video processing

URL: http://arxiv.org/abs/2107.07305v1
Date: Thu, 15 Jul 2021 13:17:11 GMT
Title: Training for temporal sparsity in deep neural networks, application in video processing
Authors: Amirreza Yousefzadeh, Manolis Sifalakis
Abstract summary: Activation sparsity improves compute efficiency and resource utilization in sparsity-aware neural network accelerators. We introduce a new layer (called Delta Activation Layer) to promote temporal sparsity of activations during training. We report an almost 3x improvement of activation sparsity, with recoverable loss of model accuracy after longer training.
Score: 0.30458514384586394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Activation sparsity improves compute efficiency and resource utilization in sparsity-aware neural network accelerators. As the predominant operation in DNNs is multiply-accumulate (MAC) of activations with weights to compute inner products, skipping operations where (at least) one of the two operands is zero can make inference more efficient in terms of latency and power. Spatial sparsification of activations is a popular topic in DNN literature and several methods have already been established to bias a DNN for it. On the other hand, temporal sparsity is an inherent feature of bio-inspired spiking neural networks (SNNs), which neuromorphic processing exploits for hardware efficiency. Introducing and exploiting spatio-temporal sparsity, is a topic much less explored in DNN literature, but in perfect resonance with the trend in DNN, to shift from static signal processing to more streaming signal processing. Towards this goal, in this paper we introduce a new DNN layer (called Delta Activation Layer), whose sole purpose is to promote temporal sparsity of activations during training. A Delta Activation Layer casts temporal sparsity into spatial activation sparsity to be exploited when performing sparse tensor multiplications in hardware. By employing delta inference and ``the usual'' spatial sparsification heuristics during training, the resulting model learns to exploit not only spatial but also temporal activation sparsity (for a given input data distribution). One may use the Delta Activation Layer either during vanilla training or during a refinement phase. We have implemented Delta Activation Layer as an extension of the standard Tensoflow-Keras library, and applied it to train deep neural networks on the Human Action Recognition (UCF101) dataset. We report an almost 3x improvement of activation sparsity, with recoverable loss of model accuracy after longer training.

Related papers

Activation-wise Propagation: A Universal Strategy to Break Timestep Constraints in Spiking Neural Networks for 3D Data Processing [29.279985043923386]
We introduce Activation-wise Membrane Potential Propagation (AMP2), a novel state update mechanism for spiking neurons. Inspired by skip connections in deep networks, AMP2 incorporates the membrane potential of neurons into network, eliminating the need for iterative updates. Our method achieves significant improvements across various 3D modalities, including 3D point clouds and event streams.
arXiv Detail & Related papers (2025-02-18T11:52:25Z)
Exploring the Benefit of Activation Sparsity in Pre-training [117.25661020250658]
We study how activation properties change during pre-training. We propose Switchable Sparse-Dense Learning (SSD) SSD achieves comparable performance with identical model size and reduces pre-training costs.
arXiv Detail & Related papers (2024-10-04T13:53:33Z)
Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation [3.5624857747396814]
Spi neural networks (SNNs) have received widespread attention as an ultra-low energy computing paradigm. Recent studies have focused on improving the feature extraction capability of SNNs, but they suffer from inefficient and suboptimal performance. We. propose a simple yet effective temporal reversed training (TRT) method to optimize the temporal performance of SNNs.
arXiv Detail & Related papers (2024-08-17T06:23:38Z)
Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training [20.49255973077044]
This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Results show a reduction of $sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. We show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.
arXiv Detail & Related papers (2023-12-14T23:07:37Z)
Dynamic Sparsity Is Channel-Level Sparsity Learner [91.31071026340746]
Dynamic sparse training (DST) is a leading sparse training approach. Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity. Our approach translates unstructured sparsity to channel-wise sparsity.
arXiv Detail & Related papers (2023-05-30T23:33:45Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting [29.685909045226847]
Brain-inspired spiking neuron networks (SNNs) have attracted widespread research interest because of their event-driven and energy-efficient characteristics. Current direct training approach with surrogate gradient results in SNNs with poor generalizability. We introduce the temporal efficient training (TET) approach to compensate for the loss of momentum in the gradient descent with SG.
arXiv Detail & Related papers (2022-02-24T08:02:37Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Training Energy-Efficient Deep Spiking Neural Networks with Single-Spike Hybrid Input Encoding [5.725845886457027]
Spiking Neural Networks (SNNs) provide higher computational efficiency in event driven neuromorphic hardware. SNNs suffer from high inference latency, resulting from inefficient input encoding and training techniques. This paper presents a training framework for low-latency energy-efficient SNNs.
arXiv Detail & Related papers (2021-07-26T06:16:40Z)
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training [18.521882534906972]
We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity. We conduct experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling.
arXiv Detail & Related papers (2021-06-22T22:44:32Z)
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency. We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.