RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts
- URL: http://arxiv.org/abs/2410.22492v2
- Date: Sun, 10 Nov 2024 00:54:21 GMT
- Title: RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts
- Authors: Saleem Ahmed, Ranga Setlur, Venu Govindaraju,
- Abstract summary: We introduce Visual Premise Proving, a novel task tailored to refine the process of chart question answering.
This approach represents a departure from conventional accuracy-based evaluation methods.
A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts.
- Score: 2.9201864249313383
- License:
- Abstract: We introduce Visual Premise Proving (VPP), a novel task tailored to refine the process of chart question answering by deconstructing it into a series of logical premises. Each of these premises represents an essential step in comprehending a chart's content and deriving logical conclusions, thereby providing a granular look at a model's reasoning abilities. This approach represents a departure from conventional accuracy-based evaluation methods, emphasizing the model's ability to sequentially validate each premise and ideally mimic human analytical processes. A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts, suggesting a synergy between these competencies. However, in our zero-shot study using the sophisticated MATCHA model on a scientific chart question answering dataset, an intriguing pattern emerged. The model showcased superior performance in chart reasoning (27\%) over chart structure (19\%) and data retrieval (14\%). This performance gap suggests that models might more readily generalize reasoning capabilities across datasets, benefiting from consistent mathematical and linguistic semantics, even when challenged by changes in the visual domain that complicate structure comprehension and data retrieval. Furthermore, the efficacy of using accuracy of binary QA for evaluating chart reasoning comes into question if models can deduce correct answers without parsing chart data or structure. VPP highlights the importance of integrating reasoning with visual comprehension to enhance model performance in chart analysis, pushing for a balanced approach in evaluating visual data interpretation capabilities.
Related papers
- Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Enhancing Question Answering on Charts Through Effective Pre-training Tasks [26.571522748519584]
We address the limitation of current VisualQA models when applied to charts and plots.
Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context.
We propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions.
arXiv Detail & Related papers (2024-06-14T14:40:10Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.
Large foundation models, such as large language models, have revolutionized various natural language processing tasks.
This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization [32.19963543411396]
This study constructs a large-scale dataset of comprehensive chart-caption pairs and fine-tuning instructions on each chart.
We propose an innovative chart summarization method, ChartThinker, which synthesizes deep analysis based on chains of thought.
Built upon the curated datasets, our trained model consistently exhibits superior performance in chart summarization tasks.
arXiv Detail & Related papers (2024-03-17T14:49:09Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic [8.155575318208628]
We introduce a benchmark and dataset for chart visual QA on real-world charts.
Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations.
Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-03T18:21:38Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension.
We propose a new model that jointly learns classification and regression.
Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.