SciCap: Generating Captions for Scientific Figures
- URL: http://arxiv.org/abs/2110.11624v2
- Date: Mon, 25 Oct 2021 04:37:30 GMT
- Title: SciCap: Generating Captions for Scientific Figures
- Authors: Ting-Yao Hsu, C. Lee Giles, Ting-Hao 'Kenneth' Huang
- Abstract summary: We introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020.
After pre-processing, SCICAP contained more than two million figures extracted from over 290,000 papers.
We established baseline models that caption graph plots, the dominant (19.2%) figure type.
- Score: 20.696070723932866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers use figures to communicate rich, complex information in
scientific papers. The captions of these figures are critical to conveying
effective messages. However, low-quality figure captions commonly occur in
scientific articles and may decrease understanding. In this paper, we propose
an end-to-end neural framework to automatically generate informative,
high-quality captions for scientific figures. To this end, we introduce SCICAP,
a large-scale figure-caption dataset based on computer science arXiv papers
published between 2010 and 2020. After pre-processing - including figure-type
classification, sub-figure identification, text normalization, and caption text
selection - SCICAP contained more than two million figures extracted from over
290,000 papers. We then established baseline models that caption graph plots,
the dominant (19.2%) figure type. The experimental results showed both
opportunities and steep challenges of generating captions for scientific
figures.
Related papers
- Figuring out Figures: Using Textual References to Caption Scientific Figures [3.358364892753541]
Previous work in automatically generating figure captions has been largely unsuccessful and has defaulted to using single-layer LSTMs.
In our work, we use the SciCap datasets curated by Hsu et al. and use a variant of a CLIP+GPT-2 encoder-decoder model with cross-attention to generate captions conditioned on the image.
arXiv Detail & Related papers (2024-06-25T21:49:21Z) - SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings [28.973082312034343]
This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions.
SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality.
A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing.
arXiv Detail & Related papers (2024-03-26T15:16:14Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - Improving Multimodal Datasets with Image Captioning [65.74736570293622]
We study how generated captions can increase the utility of web-scraped datapoints with nondescript text.
Our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text.
arXiv Detail & Related papers (2023-07-19T17:47:12Z) - SciCap+: A Knowledge Augmented Dataset to Study the Challenges of
Scientific Figure Captioning [18.94446071846939]
Figure caption generation helps move model understandings of scientific documents beyond text.
We extend the large-scale SciCap dataset to include mention-paragraphs (paragraphs mentioning figures) and OCR tokens.
Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores.
arXiv Detail & Related papers (2023-06-06T08:16:16Z) - Summaries as Captions: Generating Figure Captions for Scientific
Documents with Automated Text Summarization [31.619379039184263]
Figure caption generation can be more effectively tackled as a text summarization task in scientific documents.
We fine-tuned PEG, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs.
Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations.
arXiv Detail & Related papers (2023-02-23T20:39:06Z) - ACL-Fig: A Dataset for Scientific Figure Classification [15.241086410108512]
We develop a pipeline that extracts figures and tables from the scientific literature and a deep-learning-based framework that classifies scientific figures using visual features.
We build the first large-scale automatically annotated corpus, ACL-Fig, consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology.
The ACL-Fig-Pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.
arXiv Detail & Related papers (2023-01-28T20:27:35Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - MedICaT: A Dataset of Medical Images, Captions, and Textual References [71.3960667004975]
Previous work focused on classifying figure content rather than understanding how images relate to the text.
MedICaT consists of 217K images from 131K open access biomedical papers.
Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures.
arXiv Detail & Related papers (2020-10-12T19:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.