LineCap: Line Charts for Data Visualization Captioning Models
- URL: http://arxiv.org/abs/2207.07243v1
- Date: Fri, 15 Jul 2022 00:35:59 GMT
- Title: LineCap: Line Charts for Data Visualization Captioning Models
- Authors: Anita Mahinpei, Zona Kostic, Chris Tanner
- Abstract summary: We introduce LineCap, a novel figure captioning dataset of 3,528 figures.
We provide insights from curating this dataset and using end-to-end deep learning models for automated figure captioning.
- Score: 6.3596637237946725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data visualization captions help readers understand the purpose of a
visualization and are crucial for individuals with visual impairments. The
prevalence of poor figure captions and the successful application of deep
learning approaches to image captioning motivate the use of similar techniques
for automated figure captioning. However, research in this field has been
stunted by the lack of suitable datasets. We introduce LineCap, a novel figure
captioning dataset of 3,528 figures, and we provide insights from curating this
dataset and using end-to-end deep learning models for automated figure
captioning.
Related papers
- What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans.
We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z) - Learning text-to-video retrieval from image captioning [59.81537951811595]
We describe a protocol to study text-to-video retrieval training with unlabeled videos.
We assume (i) no access to labels for any videos, and (ii) access to labeled images in the form of text.
We show that automatically labeling video frames with image captioning allows text-to-video retrieval training.
arXiv Detail & Related papers (2024-04-26T15:56:08Z) - View Selection for 3D Captioning via Diffusion Ranking [54.78058803763221]
Cap3D method renders 3D objects into 2D views for captioning using pre-trained models.
Some rendered views of 3D objects are atypical, deviating from the training data of standard image captioning models and causing hallucinations.
We present DiffuRank, a method that leverages a pre-trained text-to-3D model to assess the alignment between 3D objects and their 2D rendered views.
arXiv Detail & Related papers (2024-04-11T17:58:11Z) - Inserting Faces inside Captions: Image Captioning with Attention Guided Merging [0.0]
We introduce AstroCaptions, a dataset for the image captioning task.
We propose a novel post-processing method to insert identified people's names inside the caption.
arXiv Detail & Related papers (2024-03-20T08:38:25Z) - VisText: A Benchmark for Semantically Rich Chart Captioning [12.117737635879037]
VisText is a dataset of 12,441 pairs of charts and captions that describe the charts' construction.
Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models.
arXiv Detail & Related papers (2023-06-28T15:16:24Z) - FuseCap: Leveraging Large Language Models for Enriched Fused Image
Captions [11.274127953112574]
We propose an automated approach to augmenting existing captions with visual details using "frozen" vision experts.
Our proposed method, FuseCap, fuses the outputs of such vision experts with the original captions using a large language model.
We release this large-scale dataset of enriched image-caption pairs for the community.
arXiv Detail & Related papers (2023-05-28T13:16:03Z) - Cross-Domain Image Captioning with Discriminative Finetuning [20.585138136033905]
Fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language.
We show that discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task.
arXiv Detail & Related papers (2023-04-04T09:33:16Z) - Iconographic Image Captioning for Artworks [2.3859169601259342]
This work utilizes a novel large-scale dataset of artwork images annotated with concepts from the Iconclass classification system designed for art and iconography.
The annotations are processed into clean textual description to create a dataset suitable for training a deep neural network model on the image captioning task.
A transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset.
The quality of the generated captions and the model's capacity to generalize to new data is explored by employing the model on a new collection of paintings and performing an analysis of the relation between commonly generated captions and the artistic genre.
arXiv Detail & Related papers (2021-02-07T23:11:33Z) - VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning [128.6138588412508]
This paper presents VIsual VOcabulary pretraining (VIVO) that performs pre-training in the absence of caption annotations.
Our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects.
arXiv Detail & Related papers (2020-09-28T23:20:02Z) - Egoshots, an ego-vision life-logging dataset and semantic fidelity
metric to evaluate diversity in image captioning models [63.11766263832545]
We present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions.
In order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF)
arXiv Detail & Related papers (2020-03-26T04:43:30Z) - TextCaps: a Dataset for Image Captioning with Reading Comprehension [56.89608505010651]
Text is omnipresent in human environments and frequently critical to understand our surroundings.
To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images.
Our dataset challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase.
arXiv Detail & Related papers (2020-03-24T02:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.