DCQA: Document-Level Chart Question Answering towards Complex Reasoning
and Common-Sense Understanding
- URL: http://arxiv.org/abs/2310.18983v1
- Date: Sun, 29 Oct 2023 11:38:08 GMT
- Title: DCQA: Document-Level Chart Question Answering towards Complex Reasoning
and Common-Sense Understanding
- Authors: Anran Wu, Luwei Xiao, Xingjiao Wu, Shuwen Yang, Junjie Xu, Zisong
Zhuang, Nian Xie, Cheng Jin, Liang He
- Abstract summary: We introduce a novel task named document-level chart question answering (DCQA)
The newly developed benchmark dataset comprises 50,010 synthetic documents integrating charts in a wide range of styles.
We present the development of a potent question-answer generation engine that employs table data, a rich color set, and basic question templates.
- Score: 19.713647367008143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visually-situated languages such as charts and plots are omnipresent in
real-world documents. These graphical depictions are human-readable and are
often analyzed in visually-rich documents to address a variety of questions
that necessitate complex reasoning and common-sense responses. Despite the
growing number of datasets that aim to answer questions over charts, most only
address this task in isolation, without considering the broader context of
document-level question answering. Moreover, such datasets lack adequate
common-sense reasoning information in their questions. In this work, we
introduce a novel task named document-level chart question answering (DCQA).
The goal of this task is to conduct document-level question answering,
extracting charts or plots in the document via document layout analysis (DLA)
first and subsequently performing chart question answering (CQA). The newly
developed benchmark dataset comprises 50,010 synthetic documents integrating
charts in a wide range of styles (6 styles in contrast to 3 for PlotQA and
ChartQA) and includes 699,051 questions that demand a high degree of reasoning
ability and common-sense understanding. Besides, we present the development of
a potent question-answer generation engine that employs table data, a rich
color set, and basic question templates to produce a vast array of reasoning
question-answer pairs automatically. Based on DCQA, we devise an OCR-free
transformer for document-level chart-oriented understanding, capable of DLA and
answering complex reasoning and common-sense questions over charts in an
OCR-free manner. Our DCQA dataset is expected to foster research on
understanding visualizations in documents, especially for scenarios that
require complex reasoning for charts in the visually-rich document. We
implement and evaluate a set of baselines, and our proposed method achieves
comparable results.
Related papers
- Enhancing Question Answering on Charts Through Effective Pre-training Tasks [26.571522748519584]
We address the limitation of current VisualQA models when applied to charts and plots.
Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context.
We propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions.
arXiv Detail & Related papers (2024-06-14T14:40:10Z) - JDocQA: Japanese Document Question Answering Dataset for Generative Language Models [15.950718839723027]
We introduce Japanese Document Question Answering (JDocQA), a large-scale document-based QA dataset.
It comprises 5,504 documents in PDF format and annotated 11,600 question-and-answer instances in Japanese.
We incorporate multiple categories of questions and unanswerable questions from the document for realistic question-answering applications.
arXiv Detail & Related papers (2024-03-28T14:22:54Z) - NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents.
To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document.
Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability.
We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z) - OpenCQA: Open-ended Question Answering with Charts [6.7038829115674945]
We introduce a new task called OpenCQA, where the goal is to answer an open-ended question about a chart with texts.
We implement and evaluate a set of baselines under three practical settings.
Our analysis of the results show that the top performing models generally produce fluent and coherent text.
arXiv Detail & Related papers (2022-10-12T23:37:30Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension.
We propose a new model that jointly learns classification and regression.
Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Semantic Graphs for Generating Deep Questions [98.5161888878238]
We propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN)
On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance.
arXiv Detail & Related papers (2020-04-27T10:52:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.