Exploring Extreme Quantization in Spiking Language Models
- URL: http://arxiv.org/abs/2405.02543v3
- Date: Mon, 1 Jul 2024 17:38:48 GMT
- Title: Exploring Extreme Quantization in Spiking Language Models
- Authors: Malyaban Bal, Yi Jiang, Abhronil Sengupta,
- Abstract summary: This paper proposes the development of a novel binary/ternary (1/1.58-bit) spiking LM architecture.
Our proposed model represents a significant advancement as the first-of-its-kind 1/1.58-bit spiking LM.
- Score: 7.986844499514244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the growing prevalence of large language model (LLM) architectures, a crucial concern persists regarding their energy and power consumption, which still lags far behind the remarkable energy efficiency of the human brain. Recent strides in spiking language models (LM) and transformer architectures aim to address this concern by harnessing the spiking activity of biological neurons to enhance energy/power efficiency. Doubling down on the principles of model quantization and energy efficiency, this paper proposes the development of a novel binary/ternary (1/1.58-bit) spiking LM architecture. Achieving scalability comparable to a deep spiking LM architecture is facilitated by an efficient knowledge distillation technique, wherein knowledge from a non-spiking full-precision "teacher" model is transferred to an extremely weight quantized spiking "student" LM. Our proposed model represents a significant advancement as the first-of-its-kind 1/1.58-bit spiking LM, and its performance is rigorously evaluated on multiple text classification tasks of the GLUE benchmark.
Related papers
- Energy-Efficient Information Representation in MNIST Classification Using Biologically Inspired Learning [1.0787328610467803]
We analyze our previously developed biologically inspired learning rule using information-theoretic concepts.<n>It emulates the brain's structural plasticity and retains only the essential number of synapses.<n>It also eliminates the need for pre-optimization of network architecture, enhances adaptability, and reflects the brain's ability to reserve'space' for new memories.
arXiv Detail & Related papers (2026-02-28T10:38:57Z) - MAR: Efficient Large Language Models via Module-aware Architecture Refinement [27.413577161712876]
Large Language Models (LLMs) excel across diverse domains but suffer from high energy costs due to quadratic attention and dense Feed-Forward Network (FFN) operations.<n>We propose Module-aware Architecture Refinement (MAR), a framework that integrates State Space Models (SSMs) for linear-time sequence modeling and applies activation sparsification to reduce FFN costs.
arXiv Detail & Related papers (2026-01-29T10:21:28Z) - Energy-based Autoregressive Generation for Neural Population Dynamics [12.867288040044501]
We introduce a novel Energy-based Autoregressive Generation framework that employs an energy-based transformer learning temporal dynamics in latent space.<n>We show that EAG achieves state-of-the-art generation quality with substantial computational efficiency improvements.<n>These results demonstrate the effectiveness of energy-based modeling for neural population dynamics with applications in neuroscience research and neural engineering.
arXiv Detail & Related papers (2025-11-18T07:11:29Z) - Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression [53.39128997308138]
We introduce information capacity, a measure of model efficiency based on text compression performance.<n> Empirical evaluations on mainstream open-source models show that models of varying sizes within a series exhibit consistent information capacity.<n>A distinctive feature of information capacity is that it incorporates tokenizer efficiency, which affects both input and output token counts.
arXiv Detail & Related papers (2025-11-11T10:07:32Z) - Speed Always Wins: A Survey on Efficient Architectures for Large Language Models [51.817121227562964]
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models.<n> Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties.<n>The traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment.
arXiv Detail & Related papers (2025-08-13T14:13:46Z) - Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning [41.40603531008809]
We evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments.<n>Our analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures.
arXiv Detail & Related papers (2025-07-01T21:21:42Z) - Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z) - FANformer: Improving Large Language Models Through Effective Periodicity Modeling [30.84203256282429]
We introduce FANformer, which integrates Fourier Analysis Network (FAN) into attention mechanism to achieve efficient periodicity modeling.
Experiments show that FANformer consistently outperforms Transformer when scaling up model size and training tokens.
To further validate the effectiveness of FANformer, we pretrain a FANformer-1B on 1 trillion tokens.
arXiv Detail & Related papers (2025-02-28T18:52:24Z) - Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability [1.542607498220242]
This research focuses on the systematic evaluation of individual weight importance throughout the training process.
We propose a method that effectively reduces model size without compromising performance.
These findings highlight the critical need for optimized AI models to ensure sustainable development.
arXiv Detail & Related papers (2025-02-24T11:34:49Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.
We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.
Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing [3.379854610429579]
Recurrent Large Language Models (R-LLM) have proven effective in mitigating the complexity of self-attention.
We propose a low-cost, training-free algorithm to sparsify R-LLMs' activations to enhance energy efficiency on neuromorphic hardware.
arXiv Detail & Related papers (2025-01-09T19:13:03Z) - Impact of ML Optimization Tactics on Greener Pre-Trained ML Models [46.78148962732881]
This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations.
We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification.
Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems.
arXiv Detail & Related papers (2024-09-19T16:23:03Z) - Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Efficient Materials Informatics between Rockets and Electrons [0.0]
This dissertation focuses on the design of functionally graded materials (FGMs) incorporating ultra-high temperature refractory high entropy alloys (RHEAs)
At the atomistic level, a data ecosystem optimized for machine learning (ML) from over 4.5 million relaxed structures, called MPDD, is used to inform experimental observations and improve thermodynamic models.
The resulting multi-level discovery infrastructure is highly generalizable as it focuses on encoding problems to solve them easily rather than looking for an existing solution.
arXiv Detail & Related papers (2024-07-05T17:03:26Z) - Lightweight Geometric Deep Learning for Molecular Modelling in Catalyst Discovery [0.0]
Open Catalyst Project aims to apply advances in graph neural networks (GNNs) to accelerate progress in catalyst discovery.
By implementing robust design patterns like geometric and symmetric message passing, we were able to train a GNN model that reached a MAE of 0.0748 in predicting the per-atom forces of adsorbate-surface interactions.
arXiv Detail & Related papers (2024-04-05T17:13:51Z) - Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models.
Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels.
We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z) - SpikingBERT: Distilling BERT to Train Spiking Language Models Using
Implicit Differentiation [2.3361887733755897]
Large language Models (LLMs) comprises of orders of magnitude less neurons and synapses than the human brain.
We propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain.
Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark.
arXiv Detail & Related papers (2023-08-21T17:20:05Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Latent Diffusion Energy-Based Model for Interpretable Text Modeling [104.85356157724372]
We introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework.
We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space.
arXiv Detail & Related papers (2022-06-13T03:41:31Z) - Interpretable Convolutional Neural Networks for Subject-Independent
Motor Imagery Classification [22.488536453952964]
We propose an explainable deep learning model for brain computer interface (BCI) study.
Specifically, we aim to classify EEG signal which is obtained from the motor-imagery (MI) task.
We visualized the heatmap which indicates the output of the LRP in form of topography to certify neuro-physiological factors.
arXiv Detail & Related papers (2021-12-14T07:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.