Related papers: Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

URL: http://arxiv.org/abs/2302.12324v3
Date: Sat, 12 Aug 2023 03:00:55 GMT
Title: Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization
Authors: Chieh-Yang Huang, Ting-Yao Hsu, Ryan Rossi, Ani Nenkova, Sungchul Kim, Gromit Yeuk-Yin Chan, Eunyee Koh, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang
Abstract summary: Figure caption generation can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEG, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations.
Score: 31.619379039184263
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEGASUS, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs (e.g., "Figure 3 shows...") into figure captions. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations. We further conducted an in-depth investigation focused on two key challenges: (i) the common presence of low-quality author-written captions and (ii) the lack of clear standards for good captions. Our code and data are available at: https://github.com/Crowd-AI-Lab/Generating-Figure-Captions-as-a-Text-Summarization-Task.

Related papers

Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing [38.53604094994033]
This paper investigates how paper authors incorporate AI-generated captions into their writing process through a user study involving 18 participants. By analyzing video recordings of the writing process through interaction analysis, we observed that participants often began by copying and refining AI-generated captions. Paper writers favored longer, detail-rich captions that integrated textual and visual elements but found current AI models less effective for complex figures.
arXiv Detail & Related papers (2025-01-10T19:39:06Z)
Multi-LLM Collaborative Caption Generation in Scientific Documents [30.856381292477177]
We introduce a framework called Multi-LLM Collaborative Figure Caption Generation (MLBCAP) Our approach unfolds in three key modules. Human evaluations demonstrate that informative captions produced by our approach rank better than human-written captions.
arXiv Detail & Related papers (2025-01-05T14:09:12Z)
Learning text-to-video retrieval from image captioning [59.81537951811595]
We describe a protocol to study text-to-video retrieval training with unlabeled videos. We assume (i) no access to labels for any videos, and (ii) access to labeled images in the form of text. We show that automatically labeling video frames with image captioning allows text-to-video retrieval training.
arXiv Detail & Related papers (2024-04-26T15:56:08Z)
SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings [28.973082312034343]
This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing.
arXiv Detail & Related papers (2024-03-26T15:16:14Z)
Improving Multimodal Datasets with Image Captioning [65.74736570293622]
We study how generated captions can increase the utility of web-scraped datapoints with nondescript text. Our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text.
arXiv Detail & Related papers (2023-07-19T17:47:12Z)
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion [17.99150939602917]
State-of-The-Art (SoTA) image captioning models often rely on the Microsoft COCO (MS-COCO) dataset for training. We present a novel approach to address previous challenges by showcasing how captions generated from different SoTA models can be effectively fused.
arXiv Detail & Related papers (2023-06-20T15:13:02Z)
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning [18.94446071846939]
Figure caption generation helps move model understandings of scientific documents beyond text. We extend the large-scale SciCap dataset to include mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores.
arXiv Detail & Related papers (2023-06-06T08:16:16Z)
SciCap: Generating Captions for Scientific Figures [20.696070723932866]
We introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing, SCICAP contained more than two million figures extracted from over 290,000 papers. We established baseline models that caption graph plots, the dominant (19.2%) figure type.
arXiv Detail & Related papers (2021-10-22T07:10:41Z)
Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning. We develop three progressive model structures to learn the sentence level representations. Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z)
Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE) Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)
Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models [63.11766263832545]
We present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions. In order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF)
arXiv Detail & Related papers (2020-03-26T04:43:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.