Adaptive Bi-directional Attention: Exploring Multi-Granularity
Representations for Machine Reading Comprehension
- URL: http://arxiv.org/abs/2012.10877v2
- Date: Tue, 2 Feb 2021 08:42:32 GMT
- Title: Adaptive Bi-directional Attention: Exploring Multi-Granularity
Representations for Machine Reading Comprehension
- Authors: Nuo Chen, Fenglin Liu, Chenyu You, Peilin Zhou, Yuexian Zou
- Abstract summary: We propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor.
Results are better than the previous state-of-the-art model by 2.5$%$ EM and 2.3$%$ F1 scores.
- Score: 29.717816161964105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the attention-enhanced multi-layer encoder, such as Transformer,
has been extensively studied in Machine Reading Comprehension (MRC). To predict
the answer, it is common practice to employ a predictor to draw information
only from the final encoder layer which generates the \textit{coarse-grained}
representations of the source sequences, i.e., passage and question. Previous
studies have shown that the representation of source sequence becomes more
\textit{coarse-grained} from \textit{fine-grained} as the encoding layer
increases. It is generally believed that with the growing number of layers in
deep neural networks, the encoding process will gather relevant information for
each location increasingly, resulting in more \textit{coarse-grained}
representations, which adds the likelihood of similarity to other locations
(referring to homogeneity). Such a phenomenon will mislead the model to make
wrong judgments so as to degrade the performance. To this end, we propose a
novel approach called Adaptive Bidirectional Attention, which adaptively
exploits the source representations of different levels to the predictor.
Experimental results on the benchmark dataset, SQuAD 2.0 demonstrate the
effectiveness of our approach, and the results are better than the previous
state-of-the-art model by 2.5$\%$ EM and 2.3$\%$ F1 scores.
Related papers
- Contrastive Matrix Completion with Denoising and Augmented Graph Views for Robust Recommendation [1.0128808054306186]
Matrix completion is a widely adopted framework in recommender systems.<n>We propose a novel method called Matrix Completion using Contrastive Learning (MCCL)<n>Our approach not only improves the numerical accuracy of the predicted scores--but also produces superior rankings with improvements of up to 36% in ranking metrics.
arXiv Detail & Related papers (2025-06-12T12:47:35Z) - Magnifier: A Multi-grained Neural Network-based Architecture for Burned Area Delineation [4.833815605196964]
In crisis management and remote sensing, image segmentation plays a crucial role, enabling tasks like disaster response and emergency planning.
The problem in their development is the data scarcity and the lack of extensive benchmark datasets, limiting the capabilities of training large neural network models.
We propose a novel methodology, namely Magnifier, to improve segmentation performance with limited data availability.
arXiv Detail & Related papers (2025-04-28T08:51:54Z) - Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity [20.17288970927518]
We study the similarity of representations between the hidden layers of individual transformers.<n>We show that representations across layers are positively correlated, with similarity increasing when layers get closer.<n>We propose an aligned training method to improve the effectiveness of shallow layer.
arXiv Detail & Related papers (2024-06-20T16:41:09Z) - Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels.
Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Sequence-to-Sequence Pre-training with Unified Modality Masking for
Visual Document Understanding [3.185382039518151]
GenDoc is a sequence-to-sequence document understanding model pre-trained with unified masking across three modalities.
The proposed model utilizes an encoder-decoder architecture, which allows for increased adaptability to a wide range of downstream tasks.
arXiv Detail & Related papers (2023-05-16T15:25:19Z) - A Simplified Framework for Contrastive Learning for Node Representations [2.277447144331876]
We investigate the potential of deploying contrastive learning in combination with Graph Neural Networks for embedding nodes in a graph.
We show that the quality of the resulting embeddings and training time can be significantly improved by a simple column-wise postprocessing of the embedding matrix.
This modification yields improvements in downstream classification tasks of up to 1.5% and even beats existing state-of-the-art approaches on 6 out of 8 different benchmarks.
arXiv Detail & Related papers (2023-05-01T02:04:36Z) - CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for
Passage Retrieval [34.08763911138496]
This study brings multi-view modeling to the contextual masked auto-encoder.
We refer to this multi-view pretraining method as CoT-MAE v2.
arXiv Detail & Related papers (2023-04-05T08:00:38Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Exploring and Exploiting Multi-Granularity Representations for Machine
Reading Comprehension [13.191437539419681]
We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net)
ABA-Net adaptively exploits the source representations of different levels to the predictor.
We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
arXiv Detail & Related papers (2022-08-18T10:14:32Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE)
Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances.
We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models.
We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.