Exploring and Exploiting Multi-Granularity Representations for Machine
  Reading Comprehension
        - URL: http://arxiv.org/abs/2208.08750v1
- Date: Thu, 18 Aug 2022 10:14:32 GMT
- Title: Exploring and Exploiting Multi-Granularity Representations for Machine
  Reading Comprehension
- Authors: Nuo Chen, Chenyu You
- Abstract summary: We propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net)
ABA-Net adaptively exploits the source representations of different levels to the predictor.
We set the new state-of-the-art performance on the SQuAD 1.0 dataset.
- Score: 13.191437539419681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Recently, the attention-enhanced multi-layer encoder, such as Transformer,
has been extensively studied in Machine Reading Comprehension (MRC). To predict
the answer, it is common practice to employ a predictor to draw information
only from the final encoder layer which generates the coarse-grained
representations of the source sequences, i.e., passage and question. The
analysis shows that the representation of source sequence becomes more
coarse-grained from finegrained as the encoding layer increases. It is
generally believed that with the growing number of layers in deep neural
networks, the encoding process will gather relevant information for each
location increasingly, resulting in more coarse-grained representations, which
adds the likelihood of similarity to other locations (referring to
homogeneity). Such phenomenon will mislead the model to make wrong judgement
and degrade the performance. In this paper, we argue that it would be better if
the predictor could exploit representations of different granularity from the
encoder, providing different views of the source sequences, such that the
expressive power of the model could be fully utilized. To this end, we propose
a novel approach called Adaptive Bidirectional Attention-Capsule Network
(ABA-Net), which adaptively exploits the source representations of different
levels to the predictor. Furthermore, due to the better representations are at
the core for boosting MRC performance, the capsule network and self-attention
module are carefully designed as the building blocks of our encoders, which
provides the capability to explore the local and global representations,
respectively. Experimental results on three benchmark datasets, i.e., SQuAD
1.0, SQuAD 2.0 and COQA, demonstrate the effectiveness of our approach. In
particular, we set the new state-of-the-art performance on the SQuAD 1.0
dataset
 
      
        Related papers
        - Identifying Super Spreaders in Multilayer Networks [0.6990493129893112]
 We introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks.<n>To this end, we construct a dataset by simulating information diffusion across hundreds of networks.<n>Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer.
 arXiv  Detail & Related papers  (2025-05-27T10:14:14Z)
- Magnifier: A Multi-grained Neural Network-based Architecture for Burned   Area Delineation [4.833815605196964]
 In crisis management and remote sensing, image segmentation plays a crucial role, enabling tasks like disaster response and emergency planning.
The problem in their development is the data scarcity and the lack of extensive benchmark datasets, limiting the capabilities of training large neural network models.
We propose a novel methodology, namely Magnifier, to improve segmentation performance with limited data availability.
 arXiv  Detail & Related papers  (2025-04-28T08:51:54Z)
- Interpretable Spectral Variational AutoEncoder (ISVAE) for time series
  clustering [48.0650332513417]
 We introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE)
This arrangement compels the VAE to attend on the most informative segments of the input signal.
By deliberately constraining the VAE with this FB, we promote the development of an encoding that is discernible, separable, and of reduced dimensionality.
 arXiv  Detail & Related papers  (2023-10-18T13:06:05Z)
- Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
 We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
 arXiv  Detail & Related papers  (2023-05-17T14:30:11Z)
- Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
 We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
 arXiv  Detail & Related papers  (2023-02-12T13:51:36Z)
- Towards Better Out-of-Distribution Generalization of Neural Algorithmic
  Reasoning Tasks [51.8723187709964]
 We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
 arXiv  Detail & Related papers  (2022-11-01T18:33:20Z)
- Effective and Interpretable Information Aggregation with Capacity
  Networks [3.4012007729454807]
 Capacity networks generate multiple interpretable intermediate results which can be aggregated in a semantically meaningful space.
Our experiments show that implementing this simple inductive bias leads to improvements over different encoder-decoder architectures.
 arXiv  Detail & Related papers  (2022-07-25T09:45:16Z)
- Online Deep Learning based on Auto-Encoder [4.128388784932455]
 We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE)
Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances.
We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
 arXiv  Detail & Related papers  (2022-01-19T02:14:57Z)
- MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs [55.66953093401889]
 Masked graph autoencoder (MGAE) framework to perform effective learning on graph structure data.
Taking insights from self-supervised learning, we randomly mask a large proportion of edges and try to reconstruct these missing edges during training.
 arXiv  Detail & Related papers  (2022-01-07T16:48:07Z)
- Adaptive Bi-directional Attention: Exploring Multi-Granularity
  Representations for Machine Reading Comprehension [29.717816161964105]
 We propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor.
Results are better than the previous state-of-the-art model by 2.5$%$ EM and 2.3$%$ F1 scores.
 arXiv  Detail & Related papers  (2020-12-20T09:31:35Z)
- Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
 We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
 arXiv  Detail & Related papers  (2020-06-15T22:22:56Z)
- Rethinking and Improving Natural Language Generation with Layer-Wise
  Multi-View Decoding [59.48857453699463]
 In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
 arXiv  Detail & Related papers  (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.