Related papers: Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

URL: http://arxiv.org/abs/2102.10934v1
Date: Mon, 22 Feb 2021 12:07:16 GMT
Title: Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks
Authors: Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang
Abstract summary: We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Representations from Transformers (BERT) We obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. Experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance.
Score: 13.922700041632302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT's multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.

Related papers

SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation [31.777019330200705]
Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. We improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks. We show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese.
arXiv Detail & Related papers (2023-08-29T08:41:16Z)
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks. We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z)
Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z)
Continual Prompt Tuning for Dialog State Tracking [58.66412648276873]
A desirable dialog system should be able to continually learn new skills without forgetting old ones. We present Continual Prompt Tuning, a parameter-efficient framework that not only avoids forgetting but also enables knowledge transfer between tasks.
arXiv Detail & Related papers (2022-03-13T13:22:41Z)
BERTVision -- A Parameter-Efficient Approach for Question Answering [0.0]
We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning. Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference. Our experiments show that this approach works well not only for span QA, but also for classification, suggesting that it may be to a wider range of tasks.
arXiv Detail & Related papers (2022-02-24T17:16:25Z)
Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z)
On Commonsense Cues in BERT for Solving Commonsense Tasks [22.57431778325224]
BERT has been used for solving commonsense tasks such as CommonsenseQA. We quantitatively investigate the presence of structural commonsense cues in BERT when solving commonsense tasks, and the importance of such cues for the model prediction.
arXiv Detail & Related papers (2020-08-10T08:12:34Z)
What BERT Sees: Cross-Modal Transfer for Visual Question Generation [21.640299110619384]
We study the visual capabilities of BERT out-of-the-box, by avoiding pre-training made on supplementary data. We introduce BERT-gen, a BERT-based architecture for text generation, able to leverage on either mono- or multi- modal representations.
arXiv Detail & Related papers (2020-02-25T12:44:36Z)
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)
Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.