Related papers: ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

URL: http://arxiv.org/abs/2510.04514v1
Date: Mon, 06 Oct 2025 06:05:36 GMT
Title: ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering
Authors: Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Sumitra Ganesh, Manuela Veloso,
Abstract summary: We introduce ChartAgent, a novel agentic framework that explicitly performs visual reasoning directly within the chart's spatial domain.<n>Our work is among the first to demonstrate visually grounded reasoning for chart understanding using tool-augmented multimodal agents.
Score: 23.455587605758396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent multimodal LLMs have shown promise in chart-based visual question answering, but their performance declines sharply on unannotated charts, those requiring precise visual interpretation rather than relying on textual shortcuts. To address this, we introduce ChartAgent, a novel agentic framework that explicitly performs visual reasoning directly within the chart's spatial domain. Unlike textual chain-of-thought reasoning, ChartAgent iteratively decomposes queries into visual subtasks and actively manipulates and interacts with chart images through specialized actions such as drawing annotations, cropping regions (e.g., segmenting pie slices, isolating bars), and localizing axes, using a library of chart-specific vision tools to fulfill each subtask. This iterative reasoning process closely mirrors human cognitive strategies for chart comprehension. ChartAgent achieves state-of-the-art accuracy on the ChartBench and ChartX benchmarks, surpassing prior methods by up to 16.07% absolute gain overall and 17.31% on unannotated, numerically intensive queries. Furthermore, our analyses show that ChartAgent is (a) effective across diverse chart types, (b) achieve the highest scores across varying visual and reasoning complexity levels, and (c) serves as a plug-and-play framework that boosts performance across diverse underlying LLMs. Our work is among the first to demonstrate visually grounded reasoning for chart understanding using tool-augmented multimodal agents.

Related papers

ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning [26.725654222717335]
We introduce ChartAgent, a chart understanding framework grounded in Tool-Integrated Reasoning.<n>Inspired by human cognition, ChartAgent decomposes complex chart analysis into a sequence of observable, replayable steps.<n>We show that ChartAgent substantially improves under sparse annotation settings.
arXiv Detail & Related papers (2025-12-16T03:17:04Z)
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning [54.86473583610112]
We propose PointCoT, which integrates reflective interaction into chain-of-thought reasoning in charts.<n>By prompting MLLMs to generate bounding boxes and re-render charts based on location annotations, we establish connections between textual reasoning steps and visual grounding regions.<n>We develop two instruction-tuned models, ChartPointQ2 and ChartPointQ2.5, which outperform state-of-the-art across several chart benchmarks.
arXiv Detail & Related papers (2025-11-29T04:01:55Z)
ChartLens: Fine-grained Visual Attribution in Charts [106.44872805609673]
Post-Hoc Visual Attribution for Charts identifies fine-grained chart elements that validate a given chart-associated response.<n>We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects.<n>Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
arXiv Detail & Related papers (2025-05-25T23:17:32Z)
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning.<n>We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations.<n>Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution [47.79080056618323]
We present ChartCitor, a multi-agent framework that provides fine-grained bounding box citations by identifying supporting evidence within chart images.<n>The system orchestrates LLM agents to perform chart-to-table extraction, answer reformulation, table augmentation, evidence retrieval through pre-filtering and re-ranking, and table-to-chart mapping.
arXiv Detail & Related papers (2025-02-03T02:00:51Z)
VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning [13.011899331656018]
VProChart is a novel framework designed to address the challenges of Chart Question Answering (CQA)<n>It integrates a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach.<n>VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.
arXiv Detail & Related papers (2024-09-03T07:19:49Z)
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data.<n>We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.