SRQA: Synthetic Reader for Factoid Question Answering
- URL: http://arxiv.org/abs/2009.01630v1
- Date: Wed, 2 Sep 2020 13:16:24 GMT
- Title: SRQA: Synthetic Reader for Factoid Question Answering
- Authors: Jiuniu Wang, Wenjia Xu, Xingyu Fu, Yang Wei, Li Jin, Ziyan Chen,
Guangluan Xu, Yirong Wu
- Abstract summary: We introduce a new model called SRQA, which means Synthetic Reader for Factoid Question Answering.
This model enhances the question answering system in the multi-document scenario from three aspects.
We perform SRQA on the WebQA dataset, and experiments show that our model outperforms the state-of-the-art models.
- Score: 21.28441702154528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The question answering system can answer questions from various fields and
forms with deep neural networks, but it still lacks effective ways when facing
multiple evidences. We introduce a new model called SRQA, which means Synthetic
Reader for Factoid Question Answering. This model enhances the question
answering system in the multi-document scenario from three aspects: model
structure, optimization goal, and training method, corresponding to Multilayer
Attention (MA), Cross Evidence (CE), and Adversarial Training (AT)
respectively. First, we propose a multilayer attention network to obtain a
better representation of the evidences. The multilayer attention mechanism
conducts interaction between the question and the passage within each layer,
making the token representation of evidences in each layer takes the
requirement of the question into account. Second, we design a cross evidence
strategy to choose the answer span within more evidences. We improve the
optimization goal, considering all the answers' locations in multiple evidences
as training targets, which leads the model to reason among multiple evidences.
Third, adversarial training is employed to high-level variables besides the
word embedding in our model. A new normalization method is also proposed for
adversarial perturbations so that we can jointly add perturbations to several
target variables. As an effective regularization method, adversarial training
enhances the model's ability to process noisy data. Combining these three
strategies, we enhance the contextual representation and locating ability of
our model, which could synthetically extract the answer span from several
evidences. We perform SRQA on the WebQA dataset, and experiments show that our
model outperforms the state-of-the-art models (the best fuzzy score of our
model is up to 78.56%, with an improvement of about 2%).
Related papers
- Cross-Modal Reasoning with Event Correlation for Video Question
Answering [32.332251488360185]
We introduce the dense caption modality as a new auxiliary and distill event-correlated information from it to infer the correct answer.
We employ cross-modal reasoning modules for explicitly modeling inter-modal relationships and aggregating relevant information across different modalities.
We propose a question-guided self-adaptive multi-modal fusion module to collect the question-oriented and event-correlated evidence through multi-step reasoning.
arXiv Detail & Related papers (2023-12-20T02:30:39Z) - Progressive Evidence Refinement for Open-domain Multimodal Retrieval
Question Answering [20.59485758381809]
Current multimodal retrieval question-answering models face two main challenges.
utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence.
We propose a two-stage framework for evidence retrieval and question-answering to alleviate these issues.
arXiv Detail & Related papers (2023-10-15T01:18:39Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z) - elBERto: Self-supervised Commonsense Learning for Question Answering [131.51059870970616]
We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
arXiv Detail & Related papers (2022-03-17T16:23:45Z) - MetaQA: Combining Expert Agents for Multi-Skill Question Answering [49.35261724460689]
We argue that despite the promising results of multi-dataset models, some domains or QA formats might require specific architectures.
We propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores.
arXiv Detail & Related papers (2021-12-03T14:05:52Z) - Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To
Benchmark [14.50261153230204]
We focus on Multimodal Machine Reading (M3C) where a model is expected to answer questions based on given passage (or context)
We identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models.
We propose a systematic framework to address these biases through three Control-Knobs.
arXiv Detail & Related papers (2021-10-22T16:33:57Z) - Joint Models for Answer Verification in Question Answering Systems [85.93456768689404]
We build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one.
We tested our models on WikiQA, TREC-QA, and a real-world dataset.
arXiv Detail & Related papers (2021-07-09T05:34:36Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Dense-Caption Matching and Frame-Selection Gating for Temporal
Localization in VideoQA [96.10612095576333]
We propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions.
Our model is also comprised of dual-level attention (word/object and frame level), multi-head self-cross-integration for different sources (video and dense captions), and which pass more relevant information to gates.
We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2020-05-13T16:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.