Related papers: SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters

SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters

URL: http://arxiv.org/abs/2407.19787v1
Date: Mon, 29 Jul 2024 08:32:27 GMT
Title: SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
Authors: Shohei Tanaka, Hao Wang, Yoshitaka Ushiku,
Abstract summary: A system that can automatically generate welldesigned posters from scientific papers would reduce the workload of authors and help readers understand the outline of the paper visually. We built the SciPost dataset, which consists of 7,855 scientific posters and manual layout annotations for layout analysis and generation. All of the posters and papers in our dataset are under the CC-BY license and are publicly available.
Score: 13.149391061482895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scientific posters are used to present the contributions of scientific papers effectively in a graphical format. However, creating a well-designed poster that efficiently summarizes the core of a paper is both labor-intensive and time-consuming. A system that can automatically generate well-designed posters from scientific papers would reduce the workload of authors and help readers understand the outline of the paper visually. Despite the demand for poster generation systems, only a limited research has been conduced due to the lack of publicly available datasets. Thus, in this study, we built the SciPostLayout dataset, which consists of 7,855 scientific posters and manual layout annotations for layout analysis and generation. SciPostLayout also contains 100 scientific papers paired with the posters. All of the posters and papers in our dataset are under the CC-BY license and are publicly available. As benchmark tests for the collected dataset, we conducted experiments for layout analysis and generation utilizing existing computer vision models and found that both layout analysis and generation of posters using SciPostLayout are more challenging than with scientific papers. We also conducted experiments on generating layouts from scientific papers to demonstrate the potential of utilizing LLM as a scientific poster generation system. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayout_v2. The code is also publicly available at https://github.com/omron-sinicx/scipostlayout.

Related papers

From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature [86.7745150269054]
We introduce Panel2Patch, a novel data pipeline that mines hierarchical structure from existing biomedical scientific literature.<n>Given scientific figures and captions, Panel2Patch parses layouts, panels, and visual markers, then constructs hierarchical aligned vision-language pairs at the figure, panel, and patch levels.<n>We develop a granularity-aware pretraining strategy that unifies heterogeneous objectives from coarse didactic descriptions to fine region-focused phrases.
arXiv Detail & Related papers (2025-12-02T09:37:51Z)
SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts [17.49687801784463]
Poster layouts determine how effectively research is communicated and understood, highlighting their growing importance.<n>To bridge this gap, we introduce SciPostGen, a large-scale dataset for understanding and generating poster layouts from scientific papers.<n>We explore a framework, Retrieval-Augmented Poster Layout Generation, which retrieves layouts consistent with a given paper and uses them as guidance for layout generation.
arXiv Detail & Related papers (2025-11-27T14:27:33Z)
SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters [13.142607539907745]
Analyzing reading order and parent-child relations of posters is essential for building structure-aware summaries.<n>Despite their prevalence in academic communication, posters remain underexplored in structural analysis research.<n>We constructed SciPostTree, a dataset of approximately 8,000 posters with reading order and parent-child relations.
arXiv Detail & Related papers (2025-11-23T07:50:59Z)
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers [11.186078920251754]
Poster generation is a crucial yet challenging task in scientific communication.<n>We introduce the first benchmark and metric suite for poster generation.<n>PosterAgent is a top-down, visual-in-the-loop multi-agent pipeline.
arXiv Detail & Related papers (2025-05-27T17:58:49Z)
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation [38.53781264480452]
PosterO is a layout-centric approach to create posters for omnifarious purposes.<n>It structures layouts from datasets as trees in SVG language by universal shape, design intent vectorization, and hierarchical node representation.<n>It can generate visually appealing layouts for given images, achieving new state-of-the-art performance across various benchmarks.
arXiv Detail & Related papers (2025-05-06T18:42:24Z)
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization [19.416714365519713]
PosterSum is a novel benchmark to advance the development of vision-language models. We benchmark state-of-the-art Multimodal Large Language Models (MLLMs) on PosterSum. We propose Segment & Summarize, a hierarchical method that outperforms current MLLMs on automated metrics.
arXiv Detail & Related papers (2025-02-24T18:35:39Z)
Map2Text: New Content Generation from Low-Dimensional Visualizations [60.02149343347818]
We introduce Map2Text, a novel task that translates spatial coordinates within low-dimensional visualizations into new, coherent, and accurately aligned textual content. This allows users to explore and navigate undiscovered information embedded in these spatial layouts interactively and intuitively.
arXiv Detail & Related papers (2024-12-24T20:16:13Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z)
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model [73.38800189095173]
This work focuses on strengthening the multi-modal diagram analysis ability of Multimodal LLMs. By parsing Latex source files of high-quality papers, we carefully build a multi-modal diagram understanding dataset M-Paper. M-Paper is the first dataset to support joint comprehension of multiple scientific diagrams, including figures and tables in the format of images or Latex codes.
arXiv Detail & Related papers (2023-11-30T04:43:26Z)
Interactive Distillation of Large Single-Topic Corpora of Scientific Papers [1.2954493726326113]
A more robust but time-consuming approach is to build the dataset constructively in which a subject matter expert handpicks documents. Here we showcase a new tool, based on machine learning, for constructively generating targeted datasets of scientific literature.
arXiv Detail & Related papers (2023-09-19T17:18:36Z)
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents [54.744701806413204]
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. We test whether layout-infused LMs are robust to layout distribution shifts.
arXiv Detail & Related papers (2023-06-01T18:01:33Z)
Contrastive Hierarchical Discourse Graph for Scientific Document Summarization [14.930704950433324]
CHANGES is a contrastive hierarchical graph neural network for extractive scientific paper summarization. We also propose a graph contrastive learning module to learn global theme-aware sentence representations.
arXiv Detail & Related papers (2023-05-31T20:54:43Z)
The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z)
Quantifying hierarchy in scientific teams [42.444263246116485]
This paper provides a detailed description of the data collection and machine learning model used in our recent PNAS paper "Flat Teams Drive Scientific Innovation" We discuss how the features of scientific publication can be used to estimate the implicit hierarchy in the corresponding author teams.
arXiv Detail & Related papers (2022-10-12T01:28:25Z)
Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated. Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction. To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z)
SciCap: Generating Captions for Scientific Figures [20.696070723932866]
We introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing, SCICAP contained more than two million figures extracted from over 290,000 papers. We established baseline models that caption graph plots, the dominant (19.2%) figure type.
arXiv Detail & Related papers (2021-10-22T07:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.