VisText: A Benchmark for Semantically Rich Chart Captioning
- URL: http://arxiv.org/abs/2307.05356v1
- Date: Wed, 28 Jun 2023 15:16:24 GMT
- Title: VisText: A Benchmark for Semantically Rich Chart Captioning
- Authors: Benny J. Tang, Angie Boggust and Arvind Satyanarayan
- Abstract summary: VisText is a dataset of 12,441 pairs of charts and captions that describe the charts' construction.
Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models.
- Score: 12.117737635879037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Captions that describe or explain charts help improve recall and
comprehension of the depicted data and provide a more accessible medium for
people with visual disabilities. However, current approaches for automatically
generating such captions struggle to articulate the perceptual or cognitive
features that are the hallmark of charts (e.g., complex trends and patterns).
In response, we introduce VisText: a dataset of 12,441 pairs of charts and
captions that describe the charts' construction, report key statistics, and
identify perceptual and cognitive phenomena. In VisText, a chart is available
as three representations: a rasterized image, a backing data table, and a scene
graph -- a hierarchical representation of a chart's visual elements akin to a
web page's Document Object Model (DOM). To evaluate the impact of VisText, we
fine-tune state-of-the-art language models on our chart captioning task and
apply prefix-tuning to produce captions that vary the semantic content they
convey. Our models generate coherent, semantically rich captions and perform on
par with state-of-the-art chart captioning models across machine translation
and text generation metrics. Through qualitative analysis, we identify six
broad categories of errors that our models make that can inform future work.
Related papers
- On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks [31.414783623207477]
We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary.
We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations.
We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are.
arXiv Detail & Related papers (2024-05-22T12:18:52Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph
Parsing [66.70054075041487]
Existing scene graphs that convert image captions into scene graphs often suffer from two types of errors.
First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resulting in a lack of faithfulness.
Second, the generated scene graphs have high inconsistency, with the same semantics represented by different annotations.
arXiv Detail & Related papers (2023-05-27T15:38:31Z) - UniChart: A Universal Vision-language Pretrained Model for Chart
Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning.
UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language.
We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
arXiv Detail & Related papers (2023-05-24T06:11:17Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Chart-to-Text: A Large-Scale Benchmark for Chart Summarization [9.647079534077472]
We present Chart-to-text, a large-scale benchmark with two datasets and a total of 44,096 charts.
We explain the dataset construction process and analyze the datasets.
arXiv Detail & Related papers (2022-03-12T17:01:38Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - Chart-to-Text: Generating Natural Language Descriptions for Charts by
Adapting the Transformer Model [6.320141734801679]
We introduce a new dataset and present a neural model for automatically generating natural language summaries for charts.
The generated summaries provide an interpretation of the chart and convey the key insights found within that chart.
arXiv Detail & Related papers (2020-10-18T23:57:33Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.