Related papers: AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

URL: http://arxiv.org/abs/2405.13580v1
Date: Wed, 22 May 2024 12:18:52 GMT
Title: AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks
Authors: Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen,
Abstract summary: We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary. We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations. We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are.
Score: 31.414783623207477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chart summarization is a crucial task for blind and visually impaired individuals as it is their primary means of accessing and interpreting graphical data. Crafting high-quality descriptions is challenging because it requires precise communication of essential details within the chart without vision perception. Many chart analysis methods, however, produce brief, unstructured responses that may contain significant hallucinations, affecting their reliability for blind people. To address these challenges, this work presents three key contributions: (1) We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary that features long-context, and semantically rich annotations. (2) We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations through training with multiple pretext tasks, yielding a performance gain with ${\sim}2.5\%$. (3) We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are. Our dataset and codes are publicly available on our project page: https://github.com/moured/AltChart.

Related papers

ChartLens: Fine-grained Visual Attribution in Charts [106.44872805609673]
Post-Hoc Visual Attribution for Charts identifies fine-grained chart elements that validate a given chart-associated response.<n>We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects.<n>Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
arXiv Detail & Related papers (2025-05-25T23:17:32Z)
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning. We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations. Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding. Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
AskChart: Universal Chart Understanding through Textual Enhancement [20.075911012193494]
State-of-the-art approaches primarily focus on visual cues from chart images, failing to explicitly incorporate rich textual information embedded within the charts. We introduce AskChart, a universal model that explicitly integrates both textual and visual cues from charts using a Mixture of Experts (MoE) architecture.
arXiv Detail & Related papers (2024-12-26T09:59:43Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts. We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z)
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning [28.204261069650897]
We introduce ChartInstruct: a novel chart-specific vision-language Instruction-following dataset comprising 191K instructions generated with 71K charts. In experiments on four downstream tasks, we first show the effectiveness of our model--achieving a new set of state-of-the-art results.
arXiv Detail & Related papers (2024-03-14T01:40:23Z)
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z)
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data. In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks. Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z)
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs [71.55796212450055]
We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP)
arXiv Detail & Related papers (2023-05-29T22:29:03Z)
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning. UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language. We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
arXiv Detail & Related papers (2023-05-24T06:11:17Z)
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z)
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization [9.647079534077472]
We present Chart-to-text, a large-scale benchmark with two datasets and a total of 44,096 charts. We explain the dataset construction process and analyze the datasets.
arXiv Detail & Related papers (2022-03-12T17:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.