RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic
- URL: http://arxiv.org/abs/2308.01979v1
- Date: Thu, 3 Aug 2023 18:21:38 GMT
- Title: RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic
- Authors: Saleem Ahmed, Bhavin Jawade, Shubham Pandey, Srirangaraj Setlur, Venu
Govindaraju
- Abstract summary: We introduce a benchmark and dataset for chart visual QA on real-world charts.
Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations.
Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
- Score: 8.155575318208628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive study of chart visual question-answering(QA) task,
to address the challenges faced in comprehending and extracting data from chart
visualizations within documents. Despite efforts to tackle this problem using
synthetic charts, solutions are limited by the shortage of annotated real-world
data. To fill this gap, we introduce a benchmark and dataset for chart visual
QA on real-world charts, offering a systematic analysis of the task and a novel
taxonomy for template-based chart question creation. Our contribution includes
the introduction of a new answer type, 'list', with both ranked and unranked
variations. Our study is conducted on a real-world chart dataset from
scientific literature, showcasing higher visual complexity compared to other
works. Our focus is on template-based QA and how it can serve as a standard for
evaluating the first-order logic capabilities of models. The results of our
experiments, conducted on a real-world out-of-distribution dataset, provide a
robust evaluation of large-scale pre-trained models and advance the field of
chart visual QA and formal logic verification for neural networks in general.
Related papers
- RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts [2.9201864249313383]
We introduce Visual Premise Proving, a novel task tailored to refine the process of chart question answering.
This approach represents a departure from conventional accuracy-based evaluation methods.
A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts.
arXiv Detail & Related papers (2024-10-29T19:32:53Z) - VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning [13.011899331656018]
VProChart is a novel framework designed to address the challenges of Chart Question Answering (CQA)
It integrates a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach.
VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.
arXiv Detail & Related papers (2024-09-03T07:19:49Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild [28.643565008567172]
We introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma.
Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images.
Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking.
arXiv Detail & Related papers (2024-07-04T22:16:40Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - Enhancing Question Answering on Charts Through Effective Pre-training Tasks [26.571522748519584]
We address the limitation of current VisualQA models when applied to charts and plots.
Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context.
We propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions.
arXiv Detail & Related papers (2024-06-14T14:40:10Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.
Large foundation models, such as large language models, have revolutionized various natural language processing tasks.
This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension.
We propose a new model that jointly learns classification and regression.
Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.