NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
- URL: http://arxiv.org/abs/2404.01306v3
- Date: Wed, 5 Jun 2024 16:07:13 GMT
- Title: NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
- Authors: Amit Dhurandhar, Tejaswini Pedapati, Ronny Luss, Soham Dan, Aurelie Lozano, Payel Das, Georgios Kollias,
- Abstract summary: Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP)
However, expensive training as well as inference remains a significant impediment to their widespread applicability.
Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology.
- Score: 35.10729451729596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) due to their impressive performance on various tasks. However, expensive training as well as inference remains a significant impediment to their widespread applicability. While enforcing sparsity at various levels of the model architecture has found promise in addressing scaling and efficiency issues, there remains a disconnect between how sparsity affects network topology. Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology. Specifically, we exploit mechanisms seen in biological networks, such as preferential attachment and redundant synapse pruning, and show that principled, model-agnostic sparsity approaches are performant and efficient across diverse NLP tasks, spanning both classification (such as natural language inference) and generation (summarization, machine translation), despite our sole objective not being optimizing performance. NeuroPrune is competitive with (or sometimes superior to) baselines on performance and can be up to $10$x faster in terms of training time for a given level of sparsity, simultaneously exhibiting measurable improvements in inference time in many cases.
Related papers
- Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models [3.0753589871055107]
Event-based neural networks (SNNs) naturally exhibit activity sparsity, and many methods exist to sparsify their connectivity by pruning weights.
We study the effects of weight pruning when combined with activity sparsity on language modeling tasks.
Our results suggest sparsely connected event-based neural networks are promising candidates for effective and efficient sequence modeling.
arXiv Detail & Related papers (2024-05-01T10:33:36Z) - Sparse Multitask Learning for Efficient Neural Representation of Motor
Imagery and Execution [30.186917337606477]
We introduce a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks.
Given a dual-task CNN model for MI-ME classification, we apply a saliency-based sparsification approach to prune superfluous connections.
Our results indicate that this tailored sparsity can mitigate the overfitting problem and improve the test performance with small amount of data.
arXiv Detail & Related papers (2023-12-10T09:06:16Z) - Activity Sparsity Complements Weight Sparsity for Efficient RNN
Inference [2.0822643340897273]
We show that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model.
We achieve up to $20times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task.
arXiv Detail & Related papers (2023-11-13T08:18:44Z) - SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network [39.54624592783459]
Spiking Neural Networks (SNNs) have emerged as a promising alternative to conventional Artificial Neural Networks (ANNs)
This paper presents SpikeCLIP, a novel framework designed to bridge the modality gap in spike-based computation.
arXiv Detail & Related papers (2023-10-10T09:57:17Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Discriminatively-Tuned Generative Classifiers for Robust Natural
Language Inference [59.62779187457773]
We propose a generative classifier for natural language inference (NLI)
We compare it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT.
Experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental settings.
arXiv Detail & Related papers (2020-10-08T04:44:00Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - How much complexity does an RNN architecture need to learn
syntax-sensitive dependencies? [9.248882589228089]
Long short-term memory (LSTM) networks are capable of encapsulating long-range dependencies.
Simple recurrent networks (SRNs) have generally been less successful at capturing long-range dependencies.
We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations.
arXiv Detail & Related papers (2020-05-17T09:13:28Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.