Related papers: Classification-Regression for Chart Comprehension

Classification-Regression for Chart Comprehension

URL: http://arxiv.org/abs/2111.14792v1
Date: Mon, 29 Nov 2021 18:46:06 GMT
Title: Classification-Regression for Chart Comprehension
Authors: Matan Levy, Rami Ben-Ari, Dani Lischinski
Abstract summary: Chart question answering (CQA) is a task used for assessing chart comprehension. We propose a new model that jointly learns classification and regression. Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
Score: 16.311371103939205
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Charts are a popular and effective form of data visualization. Chart question answering (CQA) is a task used for assessing chart comprehension, which is fundamentally different from understanding natural images. CQA requires analyzing the relationships between the textual and the visual components of a chart, in order to answer general questions or infer numerical values. Most existing CQA datasets and it models are based on simplifying assumptions that often enable surpassing human performance. In this work, we further explore the reasons behind this outcome and propose a new model that jointly learns classification and regression. Our language-vision set up with co-attention transformers captures the complex interactions between the question and the textual elements, which commonly exist in real-world charts. We validate these conclusions with extensive experiments and breakdowns on the realistic PlotQA dataset, outperforming previous approaches by a large margin, while showing competitive performance on FigureQA. Our model's edge is particularly emphasized on questions with out-of-vocabulary answers, many of which require regression. We hope that this work will stimulate further research towards solving the challenging and highly practical task of chart comprehension.

Related papers

RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding. Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content. Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z)
RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts [2.9201864249313383]
We introduce Visual Premise Proving, a novel task tailored to refine the process of chart question answering. This approach represents a departure from conventional accuracy-based evaluation methods. A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts.
arXiv Detail & Related papers (2024-10-29T19:32:53Z)
GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering [12.485921065840294]
Chart Question Answering (CQA) aims at answering questions based on the visual chart content. We propose a novel Graph-of-Thought (GoT) guided compositional reasoning model called GoT-CQA. GoT-CQA achieves outstanding performance, especially in complex human-written and reasoning questions.
arXiv Detail & Related papers (2024-09-04T10:56:05Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts. We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
Enhancing Question Answering on Charts Through Effective Pre-training Tasks [26.571522748519584]
We address the limitation of current VisualQA models when applied to charts and plots. Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context. We propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions.
arXiv Detail & Related papers (2024-06-14T14:40:10Z)
QAGCF: Graph Collaborative Filtering for Q&A Recommendation [58.21387109664593]
Question and answer (Q&A) platforms usually recommend question-answer pairs to meet users' knowledge acquisition needs. This makes user behaviors more complex, and presents two challenges for Q&A recommendation. We introduce Question & Answer Graph Collaborative Filtering (QAGCF), a graph neural network model that creates separate graphs for collaborative and semantic views.
arXiv Detail & Related papers (2024-06-07T10:52:37Z)
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic [8.155575318208628]
We introduce a benchmark and dataset for chart visual QA on real-world charts. Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations. Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-03T18:21:38Z)
Learning Situation Hyper-Graphs for Video Question Answering [95.18071873415556]
We propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs. We train a situation hyper-graph decoder to implicitly identify graph representations with actions and object/human-object relationships from the input video clip. Our results show that learning the underlying situation hyper-graphs helps the system to significantly improve its performance for novel challenges of video question-answering tasks.
arXiv Detail & Related papers (2023-04-18T01:23:11Z)
OpenCQA: Open-ended Question Answering with Charts [6.7038829115674945]
We introduce a new task called OpenCQA, where the goal is to answer an open-ended question about a chart with texts. We implement and evaluate a set of baselines under three practical settings. Our analysis of the results show that the top performing models generally produce fluent and coherent text.
arXiv Detail & Related papers (2022-10-12T23:37:30Z)
Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs. Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z)
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering [71.6781118080461]
We propose a Graph Matching Attention (GMA) network for Visual Question Answering (VQA) task. firstly, it builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information. Next, we explore the intra-modality relationships by a dual-stage graph encoder and then present a bilateral cross-modality graph matching attention to infer the relationships between the image and the question. Experiments demonstrate that our network achieves state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset.
arXiv Detail & Related papers (2021-12-14T10:01:26Z)
Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.