Deep Understanding based Multi-Document Machine Reading Comprehension
- URL: http://arxiv.org/abs/2204.03494v1
- Date: Fri, 25 Feb 2022 12:56:02 GMT
- Title: Deep Understanding based Multi-Document Machine Reading Comprehension
- Authors: Feiliang Ren, Yongkang Liu, Bochao Li, Zhibo Wang, Yu Guo, Shilei Liu,
Huimin Wu, Jiaqi Wang, Chunchao Liu, Bingchao Wang
- Abstract summary: We propose a deep understanding based model for multi-document machine reading comprehension.
It has three cascaded deep understanding modules which are designed to understand the accurate semantic meaning of words.
We evaluate our model on two large scale benchmark datasets, namely TriviaQA Web and DuReader.
- Score: 22.319892892352414
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most existing multi-document machine reading comprehension models mainly
focus on understanding the interactions between the input question and
documents, but ignore following two kinds of understandings. First, to
understand the semantic meaning of words in the input question and documents
from the perspective of each other. Second, to understand the supporting cues
for a correct answer from the perspective of intra-document and
inter-documents. Ignoring these two kinds of important understandings would
make the models oversee some important information that may be helpful for
inding correct answers. To overcome this deiciency, we propose a deep
understanding based model for multi-document machine reading comprehension. It
has three cascaded deep understanding modules which are designed to understand
the accurate semantic meaning of words, the interactions between the input
question and documents, and the supporting cues for the correct answer. We
evaluate our model on two large scale benchmark datasets, namely TriviaQA Web
and DuReader. Extensive experiments show that our model achieves
state-of-the-art results on both datasets.
Related papers
- Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning [0.0]
Existing document understanding models tend to generate answers with a single word or phrase directly.
We use Multi-modal Large Language Models (MLLMs) to generate step-wise question-and-answer pairs for document images.
We then use the generated high-quality data to train a humanized document understanding and reasoning model, dubbed DocAssistant.
arXiv Detail & Related papers (2024-02-26T01:17:50Z) - Uncertainty Guided Global Memory Improves Multi-Hop Question Answering [3.7013865226473848]
We propose a two-stage method that first collects relevant information over the entire document to the memory and then combines it with local context to solve the task.
Our experimental results show that fine-tuning a pre-trained model with memory-augmented input, including the most certain global elements, improves the model's performance.
arXiv Detail & Related papers (2023-11-29T23:45:57Z) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document
Understanding [55.4806974284156]
Document understanding refers to automatically extract, analyze and comprehend information from digital documents, such as a web page.
Existing Multi-model Large Language Models (MLLMs) have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition.
arXiv Detail & Related papers (2023-07-04T11:28:07Z) - A Topic-aware Summarization Framework with Different Modal Side
Information [40.11141446039445]
We propose a general summarization framework, which can flexibly incorporate various modalities of side information.
We first propose a unified topic encoder, which jointly discovers latent topics from the document and various kinds of side information.
Results show that our model significantly surpasses strong baselines on three public single-modal or multi-modal benchmark summarization datasets.
arXiv Detail & Related papers (2023-05-19T08:09:45Z) - Understanding ME? Multimodal Evaluation for Fine-grained Visual
Commonsense [98.70218717851665]
It is unclear whether the models really understand the visual scene and underlying commonsense knowledge due to limited evaluation data resources.
We present a Multimodal Evaluation (ME) pipeline to automatically generate question-answer pairs to test models' understanding of the visual scene, text, and related knowledge.
We then take a step further to show that training with the ME data boosts the model's performance in standard VCR evaluation.
arXiv Detail & Related papers (2022-11-10T21:44:33Z) - Learning Diverse Document Representations with Deep Query Interactions
for Dense Retrieval [79.37614949970013]
We propose a new dense retrieval model which learns diverse document representations with deep query interactions.
Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations.
arXiv Detail & Related papers (2022-08-08T16:00:55Z) - An Understanding-Oriented Robust Machine Reading Comprehension Model [12.870425062204035]
We propose an understanding-oriented machine reading comprehension model to address three kinds of robustness issues.
Specifically, we first use a natural language inference module to help the model understand the accurate semantic meanings of input questions.
Third, we propose a multilanguage learning mechanism to address the issue of generalization.
arXiv Detail & Related papers (2022-07-01T03:32:02Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - Document Modeling with Graph Attention Networks for Multi-grained
Machine Reading Comprehension [127.3341842928421]
Natural Questions is a new challenging machine reading comprehension benchmark.
It has two-grained answers, which are a long answer (typically a paragraph) and a short answer (one or more entities inside the long answer)
Existing methods treat these two sub-tasks individually during training while ignoring their dependencies.
We present a novel multi-grained machine reading comprehension framework that focuses on modeling documents at their hierarchical nature.
arXiv Detail & Related papers (2020-05-12T14:20:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.