Related papers: Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning

Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning

URL: http://arxiv.org/abs/2512.21789v1
Date: Thu, 25 Dec 2025 21:39:10 GMT
Title: Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning
Authors: Ting-Hao K. Huang, Ryan A. Rossi, Sungchul Kim, Tong Yu, Ting-Yao E. Hsu, Ho Yin, Ng, C. Lee Giles,
Abstract summary: The SciCap project grew from a small seed-funded idea at Penn State into one of the central efforts shaping the scientific figure-captioning landscape.<n>Over these five years, we curated, released, and continually updated a large collection of figure-caption pairs from arXiv papers.<n>We look back at the first five years of SciCap and summarize the key technical and methodological lessons we learned.
Score: 47.682237295499306
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Between 2021 and 2025, the SciCap project grew from a small seed-funded idea at The Pennsylvania State University (Penn State) into one of the central efforts shaping the scientific figure-captioning landscape. Supported by a Penn State seed grant, Adobe, and the Alfred P. Sloan Foundation, what began as our attempt to test whether domain-specific training, which was successful in text models like SciBERT, could also work for figure captions expanded into a multi-institution collaboration. Over these five years, we curated, released, and continually updated a large collection of figure-caption pairs from arXiv papers, conducted extensive automatic and human evaluations on both generated and author-written captions, navigated the rapid rise of large language models (LLMs), launched annual challenges, and built interactive systems that help scientists write better captions. In this piece, we look back at the first five years of SciCap and summarize the key technical and methodological lessons we learned. We then outline five major unsolved challenges and propose directions for the next phase of research in scientific figure captioning.

Related papers

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines [112.78540935201558]
We present a scientific reasoning foundation model that aligns natural language with heterogeneous scientific representations.<n>The model is pretrained on a 206B-token corpus spanning scientific text, pure sequences, and sequence-text pairs, then aligned via SFT on 40M instructions.<n>It supports four capability families, covering up to 103 tasks across: (i) faithful translation between text and scientific formats, (ii) text/knowledge extraction, (iii) property prediction, (iv) property classification, (v) unconditional and conditional sequence generation and design.
arXiv Detail & Related papers (2025-09-25T17:52:06Z)
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023 [33.089795292870186]
In 2023, the first SciCap Challenge took place, inviting global teams to use an expanded SciCap dataset to develop models for captioning diverse figure types across various academic fields.<n>This paper presents an overview of the first SciCap Challenge and details the performance of various models on its data, capturing a snapshot of the fields state.<n>We found that professional editors overwhelmingly preferred figure captions generated by GPT-4V over those from all other models and even the original captions written by authors.
arXiv Detail & Related papers (2025-01-31T18:02:19Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [97.31347312130119]
SciRIFF (Scientific Resource for Instruction-Following and Finetuning) is a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks.<n>These tasks span five core scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification.<n> SciRIFF is unique in being entirely expert-written, high-quality instruction-following dataset for extracting and synthesizing information from research literature across diverse scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings [28.973082312034343]
This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing.
arXiv Detail & Related papers (2024-03-26T15:16:14Z)
SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning [18.94446071846939]
Figure caption generation helps move model understandings of scientific documents beyond text. We extend the large-scale SciCap dataset to include mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores.
arXiv Detail & Related papers (2023-06-06T08:16:16Z)
Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization [31.619379039184263]
Figure caption generation can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEG, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations.
arXiv Detail & Related papers (2023-02-23T20:39:06Z)
The Semantic Scholar Open Data Platform [92.2948743167744]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.<n>We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.<n>The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z)
A Computational Inflection for Scientific Discovery [48.176406062568674]
We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge. Computer science is poised to ignite a revolution in the scientific process itself.
arXiv Detail & Related papers (2022-05-04T11:36:54Z)
SciCap: Generating Captions for Scientific Figures [20.696070723932866]
We introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing, SCICAP contained more than two million figures extracted from over 290,000 papers. We established baseline models that caption graph plots, the dominant (19.2%) figure type.
arXiv Detail & Related papers (2021-10-22T07:10:41Z)
From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence. Research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.