SpikingBERT: Distilling BERT to Train Spiking Language Models Using
Implicit Differentiation
- URL: http://arxiv.org/abs/2308.10873v3
- Date: Sun, 18 Feb 2024 17:40:31 GMT
- Title: SpikingBERT: Distilling BERT to Train Spiking Language Models Using
Implicit Differentiation
- Authors: Malyaban Bal, Abhronil Sengupta
- Abstract summary: Large language Models (LLMs) comprises of orders of magnitude less neurons and synapses than the human brain.
We propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain.
Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark.
- Score: 2.3361887733755897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language Models (LLMs), though growing exceedingly powerful, comprises
of orders of magnitude less neurons and synapses than the human brain. However,
it requires significantly more power/energy to operate. In this work, we
propose a novel bio-inspired spiking language model (LM) which aims to reduce
the computational cost of conventional LMs by drawing motivation from the
synaptic information flow in the brain. In this paper, we demonstrate a
framework that leverages the average spiking rate of neurons at equilibrium to
train a neuromorphic spiking LM using implicit differentiation technique,
thereby overcoming the non-differentiability problem of spiking neural network
(SNN) based algorithms without using any type of surrogate gradient. The
steady-state convergence of the spiking neurons also allows us to design a
spiking attention mechanism, which is critical in developing a scalable spiking
LM. Moreover, the convergence of average spiking rate of neurons at equilibrium
is utilized to develop a novel ANN-SNN knowledge distillation based technique
wherein we use a pre-trained BERT model as "teacher" to train our "student"
spiking architecture. While the primary architecture proposed in this paper is
motivated by BERT, the technique can be potentially extended to different kinds
of LLMs. Our work is the first one to demonstrate the performance of an
operational spiking LM architecture on multiple different tasks in the GLUE
benchmark.
Related papers
- General Self-Prediction Enhancement for Spiking Neurons [71.01912385372577]
Spiking Neural Networks (SNNs) are highly energy-efficient due to event-driven, sparse computation, but their training is challenged by spike non-differentiability and trade-offs among performance, efficiency, and biological plausibility.<n>We propose a self-prediction enhanced spiking neuron method that generates an internal prediction current from its input-output history to modulate membrane potential.<n>This design offers dual advantages, it creates a continuous gradient path that alleviates vanishing gradients and boosts training stability and accuracy, while also aligning with biological principles, which resembles distal dendritic modulation and error-driven synaptic plasticity.
arXiv Detail & Related papers (2026-01-29T15:08:48Z) - Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition [52.59263087086756]
Training deep neural networks (SNNs) has critically depended on explicit normalization schemes, such as batch normalization.<n>We propose a normalization-free learning framework that incorporates lateral inhibition inspired by cortical circuits.<n>We show that our framework enables stable training of deep SNNs with biological realism and achieves competitive performance without resorting to explicit normalizations.
arXiv Detail & Related papers (2025-09-27T11:11:30Z) - Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training [63.3991315762955]
Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation.<n>Most existing SNNs assume a single time constant for neuronal membrane voltage dynamics, modeled by first-order ordinary differential equations (ODEs) with Markovian characteristics.<n>We propose the Fractional SPIKE Differential Equation neural network (fspikeDE), which captures long-term dependencies in membrane voltage and spike trains through fractional-order dynamics.
arXiv Detail & Related papers (2025-07-22T18:20:56Z) - Channel-wise Parallelizable Spiking Neuron with Multiplication-free Dynamics and Large Temporal Receptive Fields [32.349167886062105]
Spiking Neural Networks (SNNs) are distinguished from Artificial Neural Networks (ANNs) for their sophisticated neuronal dynamics and sparse binary activations (spikes) inspired by the biological neural system.
Traditional neuron models use iterative step-by-step dynamics, resulting in serial computation and slow training speed of SNNs.
Recent parallelizable spiking neuron models have been proposed to fully utilize the massive parallel computing ability of graphics processing units to accelerate the training of SNNs.
arXiv Detail & Related papers (2025-01-24T13:44:08Z) - SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models [19.04709216497077]
We develop spiking state space models (SpikingSSMs) for long sequence learning.
Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block.
We propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds.
arXiv Detail & Related papers (2024-08-27T09:35:49Z) - Context Gating in Spiking Neural Networks: Achieving Lifelong Learning through Integration of Local and Global Plasticity [20.589970453110208]
Humans learn multiple tasks in succession with minimal mutual interference, through the context gating mechanism in the prefrontal cortex (PFC)
We propose SNN with context gating trained by the local plasticity rule (CG-SNN) for lifelong learning.
Experiments show that the proposed model is effective in maintaining the past learning experience and has better task-selectivity than other methods during lifelong learning.
arXiv Detail & Related papers (2024-06-04T01:35:35Z) - Exploring Extreme Quantization in Spiking Language Models [7.986844499514244]
This paper proposes the development of a novel binary/ternary (1/1.58-bit) spiking LM architecture.
Our proposed model represents a significant advancement as the first-of-its-kind 1/1.58-bit spiking LM.
arXiv Detail & Related papers (2024-05-04T03:00:23Z) - Single Neuromorphic Memristor closely Emulates Multiple Synaptic
Mechanisms for Energy Efficient Neural Networks [71.79257685917058]
We demonstrate memristive nano-devices based on SrTiO3 that inherently emulate all these synaptic functions.
These memristors operate in a non-filamentary, low conductance regime, which enables stable and energy efficient operation.
arXiv Detail & Related papers (2024-02-26T15:01:54Z) - Fully Spiking Actor Network with Intra-layer Connections for
Reinforcement Learning [51.386945803485084]
We focus on the task where the agent needs to learn multi-dimensional deterministic policies to control.
Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected layer.
To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects.
arXiv Detail & Related papers (2024-01-09T07:31:34Z) - SpikingJelly: An open-source machine learning infrastructure platform
for spike-based intelligence [51.6943465041708]
Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency.
We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips.
arXiv Detail & Related papers (2023-10-25T13:15:17Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - The Mori-Zwanzig formulation of deep learning [3.2851683371946754]
We develop a new formulation of deep learning based on the Mori-Zwanzig formalism of irreversible statistical mechanics.
New equations can be used as a starting point to develop new effective parameterizations of deep neural networks.
arXiv Detail & Related papers (2022-09-12T18:44:50Z) - Ensemble plasticity and network adaptability in SNNs [0.726437825413781]
Artificial Spiking Neural Networks (ASNNs) promise greater information processing efficiency because of discrete event-based (i.e., spike) computation.
We introduce a novel ensemble learning method based on entropy and network activation, operated exclusively using spiking activity.
It was discovered that pruning lower spike-rate neuron clusters resulted in increased generalization or a predictable decline in performance.
arXiv Detail & Related papers (2022-03-11T01:14:51Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.