Related papers: SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

URL: http://arxiv.org/abs/2308.15122v4
Date: Wed, 21 Feb 2024 13:20:21 GMT
Title: SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation
Authors: Changze Lv, Tianlong Li, Jianhan Xu, Chenxi Gu, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang
Abstract summary: Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. We improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks. We show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese.
Score: 31.777019330200705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption. Our code is available at https://github.com/Lvchangze/SpikeBERT.

Related papers

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed. We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z)
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z)
Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources. Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages. We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
Optimizing small BERTs trained for German NER [0.16058099298620418]
We investigate various training techniques of smaller BERT models and evaluate them on five public German NER tasks. We propose two new fine-tuning techniques leading to better performance: CSE-tagging and a modified form of LCRF. Furthermore, we introduce a new technique called WWA which reduces BERT memory usage and leads to a small increase in performance.
arXiv Detail & Related papers (2021-04-23T12:36:13Z)
Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks [13.922700041632302]
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Representations from Transformers (BERT) We obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. Experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance.
arXiv Detail & Related papers (2021-02-22T12:07:16Z)
Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT. We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z)
E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce [63.333860695727424]
E-commerce tasks require accurate understanding of domain phrases, whereas such fine-grained phrase-level knowledge is not explicitly modeled by BERT's training objective. To tackle the problem, we propose a unified pre-training framework, namely, E-BERT. Specifically, to preserve phrase-level knowledge, we introduce Adaptive Hybrid Masking, which allows the model to adaptively switch from learning preliminary word knowledge to learning complex phrases. To utilize product-level knowledge, we introduce Neighbor Product Reconstruction, which trains E-BERT to predict a product's associated neighbors with a denoising cross attention layer
arXiv Detail & Related papers (2020-09-07T00:15:36Z)
Neural Entity Linking on Technical Service Tickets [1.3621712165154805]
We show that a neural approach outperforms and complements hand-coded entities, with improvements of about 20% top-1 accuracy. We also show that a simple sentence-wise encoding (Bi-Encoder) offers a fast yet efficient search in practice.
arXiv Detail & Related papers (2020-05-15T15:47:02Z)
lamBERT: Language and Action Learning Using Multimodal BERT [0.1942428068361014]
This study proposes the language and action learning using multimodal BERT (lamBERT) model. Experiment is conducted in a grid environment that requires language understanding for the agent to act properly. The lamBERT model obtained higher rewards in multitask settings and transfer settings when compared to other models.
arXiv Detail & Related papers (2020-04-15T13:54:55Z)
DynaBERT: Dynamic BERT with Adaptive Width and Depth [55.18269622415814]
We propose a novel dynamic BERT model (abbreviated as DynaBERT) It can flexibly adjust the size and latency by selecting adaptive width and depth. It consistently outperforms existing BERT compression methods.
arXiv Detail & Related papers (2020-04-08T15:06:28Z)
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.