SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
- URL: http://arxiv.org/abs/2406.03287v1
- Date: Wed, 5 Jun 2024 13:59:03 GMT
- Title: SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
- Authors: Xingrun Xing, Zheng Zhang, Ziyi Ni, Shitao Xiao, Yiming Ju, Siqi Fan, Yequan Wang, Jiajun Zhang, Guoqi Li,
- Abstract summary: Bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation.
Large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models.
This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones.
- Score: 30.825695629006628
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with {0,1} levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https://github.com/Xingrun-Xing/SpikeLM.
Related papers
- SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking [43.275370104552344]
Recent large language models (LLMs) with billions of parameters have boosted their performance across various real-world applications.
Human brains exhibit significantly greater energy efficiency compared to LLMs with a similar number of parameters.
We propose the first spiking large language model as recent LLMs termed SpikeLLM.
arXiv Detail & Related papers (2024-07-05T08:37:17Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding [50.337896542603524]
We introduce SpikeMba: a multi-modal spiking saliency mamba for temporal video grounding.
Our approach integrates Spiking Neural Networks (SNNs) with state space models (SSMs) to leverage their unique advantages.
Our experiments demonstrate the effectiveness of SpikeMba, which consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-04-01T15:26:44Z) - Language Modeling on a SpiNNaker 2 Neuromorphic Chip [2.760675104404914]
Event-based networks on neuromorphic devices offer a potential way to reduce energy consumption for inference significantly.
We demonstrate the first-ever implementation of a language model on a neuromorphic device.
arXiv Detail & Related papers (2023-12-14T16:16:35Z) - SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks [21.616328837090396]
Spiking Neural Networks (SNNs) leverage sparse and event-driven activations to reduce the computational overhead associated with model inference.
We implement generative language model with binary, event-driven spiking activation units.
SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language.
arXiv Detail & Related papers (2023-02-27T16:43:04Z) - A temporally and spatially local spike-based backpropagation algorithm
to enable training in hardware [0.0]
Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks.
There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANNs)
arXiv Detail & Related papers (2022-07-20T08:57:53Z) - E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
Understanding and Generation [95.49128988683191]
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models.
We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2.
E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
arXiv Detail & Related papers (2022-05-30T08:25:36Z) - A Spiking Neural Network Structure Implementing Reinforcement Learning [0.0]
In the present paper, I describe an SNN structure which, seemingly, can be used in wide range of reinforcement learning tasks.
The SNN structure considered in the paper includes spiking neurons described by a generalization of the LIFAT (leaky integrate-and-fire neuron with adaptive threshold) model.
My concept is based on very general assumptions about RL task characteristics and has no visible limitations on its applicability.
arXiv Detail & Related papers (2022-04-09T09:08:10Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Adversarial Training for Large Neural Language Models [107.84290922621163]
We show that adversarial pre-training can improve both generalization and robustness.
ALUM regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss.
ALUM can be further combined with task-specific fine-tuning to attain additional gains.
arXiv Detail & Related papers (2020-04-20T00:07:18Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.