Exploring and Exploiting Multi-Granularity Representations for Machine
Reading Comprehension
- URL: http://arxiv.org/abs/2208.08750v1
- Date: Thu, 18 Aug 2022 10:14:32 GMT
- Title: Exploring and Exploiting Multi-Granularity Representations for Machine
Reading Comprehension
- Authors: Nuo Chen, Chenyu You
- Abstract summary: We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net)
ABA-Net adaptively exploits the source representations of different levels to the predictor.
We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
- Score: 13.191437539419681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the attention-enhanced multi-layer encoder, such as Transformer,
has been extensively studied in Machine Reading Comprehension (MRC). To predict
the answer, it is common practice to employ a predictor to draw information
only from the final encoder layer which generates the coarse-grained
representations of the source sequences, i.e., passage and question. The
analysis shows that the representation of source sequence becomes more
coarse-grained from finegrained as the encoding layer increases. It is
generally believed that with the growing number of layers in deep neural
networks, the encoding process will gather relevant information for each
location increasingly, resulting in more coarse-grained representations, which
adds the likelihood of similarity to other locations (referring to
homogeneity). Such phenomenon will mislead the model to make wrong judgement
and degrade the performance. In this paper, we argue that it would be better if
the predictor could exploit representations of different granularity from the
encoder, providing different views of the source sequences, such that the
expressive power of the model could be fully utilized. To this end, we propose
a novel approach called Adaptive Bidirectional Attention-Capsule Network
(ABA-Net), which adaptively exploits the source representations of different
levels to the predictor. Furthermore, due to the better representations are at
the core for boosting MRC performance, the capsule network and self-attention
module are carefully designed as the building blocks of our encoders, which
provides the capability to explore the local and global representations,
respectively. Experimental results on three benchmark datasets, i.e., SQuAD
1.0, SQuAD 2.0 and COQA, demonstrate the effectiveness of our approach. In
particular, we set the new state-of-the-art performance on the SQuAD 1.0
dataset
Related papers
- Interpretable Spectral Variational AutoEncoder (ISVAE) for time series
clustering [48.0650332513417]
We introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE)
This arrangement compels the VAE to attend on the most informative segments of the input signal.
By deliberately constraining the VAE with this FB, we promote the development of an encoding that is discernible, separable, and of reduced dimensionality.
arXiv Detail & Related papers (2023-10-18T13:06:05Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Effective and Interpretable Information Aggregation with Capacity
Networks [3.4012007729454807]
Capacity networks generate multiple interpretable intermediate results which can be aggregated in a semantically meaningful space.
Our experiments show that implementing this simple inductive bias leads to improvements over different encoder-decoder architectures.
arXiv Detail & Related papers (2022-07-25T09:45:16Z) - Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE)
Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances.
We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z) - MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs [55.66953093401889]
Masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data.
Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training.
arXiv Detail & Related papers (2022-01-07T16:48:07Z) - Adaptive Bi-directional Attention: Exploring Multi-Granularity
Representations for Machine Reading Comprehension [29.717816161964105]
We propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor.
Results are better than the previous state-of-the-art model by 2.5$%$ EM and 2.3$%$ F1 scores.
arXiv Detail & Related papers (2020-12-20T09:31:35Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.