Related papers: Learning Neural Networks with Sparse Activations

Learning Neural Networks with Sparse Activations

URL: http://arxiv.org/abs/2406.17989v1
Date: Wed, 26 Jun 2024 00:11:13 GMT
Title: Learning Neural Networks with Sparse Activations
Authors: Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath, Raghu Meka,
Abstract summary: In transformer networks, the activations in the hidden layer of this block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of em activation sparsity appears to be harder to exploit. We present a variety of results showing that classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts.
Score: 42.88109060676769
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.

Related papers

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs. Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Deep Learning 2.0: Artificial Neurons That Matter -- Reject Correlation, Embrace Orthogonality [0.0]
We introduce a yat-product-powered neural network, the Neural Matter Network (NMN) NMN achieves non-linear pattern recognition without activation functions. yat-MLP establishes a new paradigm for neural network design that combines simplicity with effectiveness.
arXiv Detail & Related papers (2024-11-12T16:52:51Z)
Multilinear Operator Networks [60.7432588386185]
Polynomial Networks is a class of models that does not require activation functions. We propose MONet, which relies solely on multilinear operators.
arXiv Detail & Related papers (2024-01-31T16:52:19Z)
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference [2.0822643340897273]
We show that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model. We achieve up to $20times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task.
arXiv Detail & Related papers (2023-11-13T08:18:44Z)
Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural Network [8.210103222339784]
Spiking Neural Networks (SNNs) offer a promising solution due to their low-latency and low-energy properties over traditional Artificial Neural Networks (ANNs) This paper delves into the spiking-based LTs (SLTs), examining their unique properties and potential for extreme efficiency. A sparse algorithm tailored for spiking transformer structure, which incorporates convolution operations into the Patch Embedding Projection (ConvPEP) module, has been proposed to achieve Multi-level Sparsity (MultiSp)
arXiv Detail & Related papers (2023-09-23T08:24:36Z)
Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors. LFP assigns rewards to individual connections based on their respective contributions to solving a given task. We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks [1.9424280683610138]
Overfitting is one of the fundamental challenges when training convolutional neural networks. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures.
arXiv Detail & Related papers (2021-04-13T12:55:37Z)
Reinforcement Learning with External Knowledge by using Logical Neural Networks [67.46162586940905]
A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. We propose an integrated method that enables model-free reinforcement learning from external knowledge sources.
arXiv Detail & Related papers (2021-03-03T12:34:59Z)
Activation function impact on Sparse Neural Networks [0.0]
Sparse Evolutionary Training allows for significantly lower computational complexity when compared to fully connected models. This research provides insights into the relationship between the activation function used and the network performance at various sparsity levels.
arXiv Detail & Related papers (2020-10-12T18:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.