SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
- URL: http://arxiv.org/abs/2511.18329v1
- Date: Sun, 23 Nov 2025 07:50:59 GMT
- Title: SciPostLayoutTree: A Dataset for Structural Analysis of Scientific Posters
- Authors: Shohei Tanaka, Atsushi Hashimoto, Yoshitaka Ushiku,
- Abstract summary: Analyzing reading order and parent-child relations of posters is essential for building structure-aware summaries.<n>Despite their prevalence in academic communication, posters remain underexplored in structural analysis research.<n>We constructed SciPostTree, a dataset of approximately 8,000 posters with reading order and parent-child relations.
- Score: 13.142607539907745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific posters play a vital role in academic communication by presenting ideas through visual summaries. Analyzing reading order and parent-child relations of posters is essential for building structure-aware interfaces that facilitate clear and accurate understanding of research content. Despite their prevalence in academic communication, posters remain underexplored in structural analysis research, which has primarily focused on papers. To address this gap, we constructed SciPostLayoutTree, a dataset of approximately 8,000 posters annotated with reading order and parent-child relations. Compared to an existing structural analysis dataset, SciPostLayoutTree contains more instances of spatially challenging relations, including upward, horizontal, and long-distance relations. As a solution to these challenges, we develop Layout Tree Decoder, which incorporates visual features as well as bounding box features including position and category information. The model also uses beam search to predict relations while capturing sequence-level plausibility. Experimental results demonstrate that our model improves the prediction accuracy for spatially challenging relations and establishes a solid baseline for poster structure analysis. The dataset is publicly available at https://huggingface.co/datasets/omron-sinicx/scipostlayouttree. The code is also publicly available at https://github.com/omron-sinicx/scipostlayouttree.
Related papers
- <SOG_k>: One LLM Token for Explicit Graph Structural Understanding [57.017902343605364]
We propose to incorporate one special token SOG_k> to fully represent the Structure Of Graph within a unified token space.<n>SOG_k> empowers LLMs to understand, generate, and reason in a concise and accurate manner.
arXiv Detail & Related papers (2026-02-02T07:55:09Z) - SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts [17.49687801784463]
Poster layouts determine how effectively research is communicated and understood, highlighting their growing importance.<n>To bridge this gap, we introduce SciPostGen, a large-scale dataset for understanding and generating poster layouts from scientific papers.<n>We explore a framework, Retrieval-Augmented Poster Layout Generation, which retrieves layouts consistent with a given paper and uses them as guidance for layout generation.
arXiv Detail & Related papers (2025-11-27T14:27:33Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA [45.98167752508643]
We propose a sparse spatial graph network (SSGN) that introduces a spatially aware relation pruning technique to this task.
Experiment results on TextVQA and ST-VQA datasets demonstrate that SSGN achieves promising performances.
arXiv Detail & Related papers (2023-10-13T14:39:34Z) - Generating Topological Structure of Floorplans from Room Attributes [4.1715767752637145]
We propose to extract topological information from room attributes using Iterative and adaptive graph Topology Learning (ITL)
ITL progressively predicts multiple relations between rooms; at each iteration, it improves node embeddings, which in turn facilitates generation of a better topological graph structure.
arXiv Detail & Related papers (2022-04-26T14:24:58Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers
for Analyzing Data Analysis [33.190021245507445]
Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process.
We propose a novel weakly supervised transformer-based architecture for computing joint representations of code from both abstract syntax trees and surrounding natural language comments.
We show that our model, leveraging only easily-available weak supervision, achieves a 38% increase in accuracy over expert-supplieds and outperforms a suite of baselines.
arXiv Detail & Related papers (2020-08-28T19:57:49Z) - HOSE-Net: Higher Order Structure Embedded Network for Scene Graph
Generation [20.148175528691905]
This paper presents a novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space.
We also propose a hierarchical semantic aggregation(HSA) module to reduce the number of subspaces by introducing higher order structural information.
The proposed HOSE-Net achieves the state-of-the-art performance on two popular benchmarks of Visual Genome and VRD.
arXiv Detail & Related papers (2020-08-12T07:58:13Z) - Linguistically Driven Graph Capsule Network for Visual Question
Reasoning [153.76012414126643]
We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network"
The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence.
Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
arXiv Detail & Related papers (2020-03-23T03:34:25Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z) - Relational Message Passing for Knowledge Graph Completion [78.47976646383222]
We propose a relational message passing method for knowledge graph completion.
It passes relational messages among edges iteratively to aggregate neighborhood information.
Results show our method outperforms stateof-the-art knowledge completion methods by a large margin.
arXiv Detail & Related papers (2020-02-17T03:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.