RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic
- URL: http://arxiv.org/abs/2308.01979v1
- Date: Thu, 3 Aug 2023 18:21:38 GMT
- Title: RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic
- Authors: Saleem Ahmed, Bhavin Jawade, Shubham Pandey, Srirangaraj Setlur, Venu
Govindaraju
- Abstract summary: We introduce a benchmark and dataset for chart visual QA on real-world charts.
Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations.
Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
- Score: 8.155575318208628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive study of chart visual question-answering(QA) task,
to address the challenges faced in comprehending and extracting data from chart
visualizations within documents. Despite efforts to tackle this problem using
synthetic charts, solutions are limited by the shortage of annotated real-world
data. To fill this gap, we introduce a benchmark and dataset for chart visual
QA on real-world charts, offering a systematic analysis of the task and a novel
taxonomy for template-based chart question creation. Our contribution includes
the introduction of a new answer type, 'list', with both ranked and unranked
variations. Our study is conducted on a real-world chart dataset from
scientific literature, showcasing higher visual complexity compared to other
works. Our focus is on template-based QA and how it can serve as a standard for
evaluating the first-order logic capabilities of models. The results of our
experiments, conducted on a real-world out-of-distribution dataset, provide a
robust evaluation of large-scale pre-trained models and advance the field of
chart visual QA and formal logic verification for neural networks in general.
Related papers
- Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs.
We propose task-trees as basic learning instances to align task spaces on graphs.
Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z) - Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature [33.69273440337546]
We introduce a new benchmark, Scientific Chart QA (SCI-CQA), which emphasizes flowcharts as a critical yet often overlooked category.
We curated a dataset of 202,760 image-text pairs from 15 top-tier computer science conferences papers over the past decade.
SCI-CQA also introduces a novel evaluation framework inspired by human exams, encompassing 5,629 carefully curated questions.
arXiv Detail & Related papers (2024-12-11T05:29:54Z) - RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts [2.9201864249313383]
We introduce Visual Premise Proving, a novel task tailored to refine the process of chart question answering.
This approach represents a departure from conventional accuracy-based evaluation methods.
A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts.
arXiv Detail & Related papers (2024-10-29T19:32:53Z) - VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning [13.011899331656018]
VProChart is a novel framework designed to address the challenges of Chart Question Answering (CQA)
It integrates a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach.
VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.
arXiv Detail & Related papers (2024-09-03T07:19:49Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Enhancing Question Answering on Charts Through Effective Pre-training Tasks [26.571522748519584]
We address the limitation of current VisualQA models when applied to charts and plots.
Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context.
We propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions.
arXiv Detail & Related papers (2024-06-14T14:40:10Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.
Large foundation models, such as large language models, have revolutionized various natural language processing tasks.
This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data.
We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension.
We propose a new model that jointly learns classification and regression.
Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.