Improving Semantic Matching through Dependency-Enhanced Pre-trained
Model with Adaptive Fusion
- URL: http://arxiv.org/abs/2210.08471v5
- Date: Thu, 24 Aug 2023 07:13:27 GMT
- Title: Improving Semantic Matching through Dependency-Enhanced Pre-trained
Model with Adaptive Fusion
- Authors: Jian Song, Di Liang, Rumei Li, Yuntao Li, Sirui Wang, Minlong Peng,
Wei Wu, Yongxin Yu
- Abstract summary: We propose textbfDependency-Enhanced textbfAdaptive textbfFusion textbfAttention (textbfDAFA).
It explicitly introduces dependency structure into pre-trained models and adaptively fuses it with semantic information.
By applying it on BERT, our method achieves state-of-the-art or competitive performance on 10 public datasets.
- Score: 23.00381824485556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pre-trained models like BERT have achieved great progress
on Semantic Sentence Matching. Meanwhile, dependency prior knowledge has also
shown general benefits in multiple NLP tasks. However, how to efficiently
integrate dependency prior structure into pre-trained models to better model
complex semantic matching relations is still unsettled. In this paper, we
propose the \textbf{D}ependency-Enhanced \textbf{A}daptive \textbf{F}usion
\textbf{A}ttention (\textbf{DAFA}), which explicitly introduces dependency
structure into pre-trained models and adaptively fuses it with semantic
information. Specifically, \textbf{\emph{(i)}} DAFA first proposes a
structure-sensitive paradigm to construct a dependency matrix for calibrating
attention weights. It adopts an adaptive fusion module to integrate the
obtained dependency information and the original semantic signals. Moreover,
DAFA reconstructs the attention calculation flow and provides better
interpretability. By applying it on BERT, our method achieves state-of-the-art
or competitive performance on 10 public datasets, demonstrating the benefits of
adaptively fusing dependency structure in semantic matching task.
Related papers
- Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach
for Relation Classification [17.398872494876365]
This paper introduces a novel neuro-symbolic architecture for relation classification (RC)
It combines rule-based methods with contemporary deep learning techniques.
We show that our proposed method outperforms previous state-of-the-art models in three out of four settings.
arXiv Detail & Related papers (2024-03-05T20:08:32Z) - Semi-automatic Data Enhancement for Document-Level Relation Extraction
with Distant Supervision from Large Language Models [26.523153535336725]
Document-level Relation Extraction (DocRE) aims to extract relations from a long context.
We propose a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples.
We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE.
arXiv Detail & Related papers (2023-11-13T13:10:44Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - A Novel Few-Shot Relation Extraction Pipeline Based on Adaptive
Prototype Fusion [5.636675879040131]
Few-shot relation extraction (FSRE) aims at recognizing unseen relations by learning with merely a handful of annotated instances.
This paper proposes a novel pipeline for the FSRE task based on adaptive prototype fusion.
Experiments on the benchmark dataset FewRel 1.0 show a significant improvement of our method against state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T09:44:21Z) - Enhancing Pre-trained Models with Text Structure Knowledge for Question
Generation [2.526624977753083]
We model text structure as answer position and syntactic dependency, and propose answer localness modeling and syntactic mask attention to address these limitations.
Experiments on SQuAD dataset show that our proposed two modules improve performance over the strong pre-trained model ProphetNet.
arXiv Detail & Related papers (2022-09-09T08:33:47Z) - Generative Relation Linking for Question Answering over Knowledge Bases [12.778133758613773]
We propose a novel approach for relation linking framing it as a generative problem.
We extend such sequence-to-sequence models with the idea of infusing structured data from the target knowledge base.
We train the model with the aim to generate a structured output consisting of a list of argument-relation pairs, enabling a knowledge validation step.
arXiv Detail & Related papers (2021-08-16T20:33:43Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.