DABERT: Dual Attention Enhanced BERT for Semantic Matching
- URL: http://arxiv.org/abs/2210.03454v4
- Date: Fri, 14 Apr 2023 07:14:17 GMT
- Title: DABERT: Dual Attention Enhanced BERT for Semantic Matching
- Authors: Sirui Wang, Di Liang, Jian Song, Yuntao Li, Wei Wu
- Abstract summary: We propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs.
DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism.
(2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs.
- Score: 12.348661150707313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pre-trained language models such as BERT have achieved
remarkable results in Semantic Sentence Matching. However, existing models
still suffer from insufficient ability to capture subtle differences. Minor
noise like word addition, deletion, and modification of sentences may cause
flipped predictions. To alleviate this problem, we propose a novel Dual
Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture
fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention
module, which measures soft word matches by introducing a new dual channel
alignment mechanism to model affinity and difference attention. (2) Adaptive
Fusion module, this module uses attention to learn the aggregation of
difference and affinity features, and generates a vector describing the
matching details of sentence pairs. We conduct extensive experiments on
well-studied semantic matching and robustness test datasets, and the
experimental results show the effectiveness of our proposed method.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification [0.0]
We implement a novel BERT architecture for multitask fine-tuning on three downstream tasks.
Our model, Multitask BERT, incorporates layer sharing and a triplet architecture, custom sentence pair tokenization, loss pairing, and gradient surgery.
We also apply generative adversarial learning to BERT, constructing a conditional generator model that maps from latent space to create fake embeddings.
arXiv Detail & Related papers (2024-08-11T20:05:54Z) - Emotion-cause pair extraction method based on multi-granularity information and multi-module interaction [0.6577148087211809]
The purpose of emotion-cause pair extraction is to extract the pair of emotion clauses and cause clauses.
Existing models do not adequately address the emotion and cause-induced locational imbalance of samples.
We propose an end-to-end multitasking model (MM-ECPE) based on shared interaction between GRU, knowledge graph and transformer modules.
arXiv Detail & Related papers (2024-04-10T08:00:26Z) - Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency.
We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data.
Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z) - Make BERT-based Chinese Spelling Check Model Enhanced by Layerwise
Attention and Gaussian Mixture Model [33.446533426654995]
We design a heterogeneous knowledge-infused framework to strengthen BERT-based CSC models.
We propose a novel form of n-gram-based layerwise self-attention to generate a multilayer representation.
Experimental results show that our proposed framework yields a stable performance boost over four strong baseline models.
arXiv Detail & Related papers (2023-12-27T16:11:07Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Dual Path Modeling for Semantic Matching by Perceiving Subtle Conflicts [14.563722352134949]
Transformer-based pre-trained models have achieved great improvements in semantic matching.
Existing models still suffer from insufficient ability to capture subtle differences.
We propose a novel Dual Path Modeling Framework to enhance the model's ability to perceive subtle differences.
arXiv Detail & Related papers (2023-02-24T09:29:55Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z) - Improving Contextual Representation with Gloss Regularized Pre-training [9.589252392388758]
We propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT) to enhance word semantic similarity.
By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled.
Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation.
arXiv Detail & Related papers (2022-05-13T12:50:32Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.