Related papers: ComicScene154: A Scene Dataset for Comic Analysis

ComicScene154: A Scene Dataset for Comic Analysis

URL: http://arxiv.org/abs/2508.16190v1
Date: Fri, 22 Aug 2025 08:11:58 GMT
Title: ComicScene154: A Scene Dataset for Comic Analysis
Authors: Sandro Paval, Ivan P. Yamshchikov, Pascal Meißner,
Abstract summary: Comics offer a compelling yet under-explored domain for computational narrative analysis.<n>ComicScene154 is a dataset of scene-level narrative arcs derived from public-domain comic books spanning diverse genres.
Score: 5.052646224667598
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Comics offer a compelling yet under-explored domain for computational narrative analysis, combining text and imagery in ways distinct from purely textual or audiovisual media. We introduce ComicScene154, a manually annotated dataset of scene-level narrative arcs derived from public-domain comic books spanning diverse genres. By conceptualizing comics as an abstraction for narrative-driven, multimodal data, we highlight their potential to inform broader research on multi-modal storytelling. To demonstrate the utility of ComicScene154, we present a baseline scene segmentation pipeline, providing an initial benchmark that future studies can build upon. Our results indicate that ComicScene154 constitutes a valuable resource for advancing computational methods in multimodal narrative understanding and expanding the scope of comic analysis within the Natural Language Processing community.

Related papers

Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics [1.320904960556043]
This paper presents a hierarchical knowledge graph framework for the structured understanding of visual narratives, focusing on comics.<n>It represents them through integrated knowledge graphs that capture semantic, spatial, and temporal relationships.<n>At the panel level, we construct multimodal graphs that link visual elements such as characters, objects, and actions with corresponding textual components, including dialogue and captions.
arXiv Detail & Related papers (2025-04-14T14:42:19Z)
From Panels to Prose: Generating Literary Narratives from Comics [55.544015596503726]
We develop an automated system that generates text-based literary narratives from manga comics.<n>Our approach aims to create an evocative and immersive prose that not only conveys the original narrative but also captures the depth and complexity of characters.
arXiv Detail & Related papers (2025-03-30T07:18:10Z)
One missing piece in Vision and Language: A Survey on Comics Understanding [13.766672321462435]
This survey is the first to propose a task-oriented framework for comics intelligence.<n>It aims to guide future research by addressing critical gaps in data availability and task definition.
arXiv Detail & Related papers (2024-09-14T18:26:26Z)
Panel Transitions for Genre Analysis in Visual Narratives [1.320904960556043]
We present a novel approach to do a multi-modal analysis of genre based on comics and manga-style visual narratives. We highlight some of the limitations and challenges of our existing computational approaches in modeling subjective labels.
arXiv Detail & Related papers (2023-12-14T08:05:09Z)
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips [0.0]
We create natural language descriptions of comic strips that are accessible to the visually impaired community. Our method consists of two steps: first, we use computer vision techniques to extract information about the panels, characters, and text of the comic images. We test our method on a collection of comics that have been annotated by human experts and measure its performance using both quantitative and qualitative metrics.
arXiv Detail & Related papers (2023-10-01T15:13:48Z)
Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem. We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z)
Dense Multitask Learning to Reconfigure Comics [63.367664789203936]
We develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels. Our method can successfully identify the semantic units as well as the notion of 3D in comic panels.
arXiv Detail & Related papers (2023-07-16T15:10:34Z)
Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection [37.083051419659135]
Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models. Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
arXiv Detail & Related papers (2023-06-30T08:34:08Z)
NewsStories: Illustrating articles with visual summaries [49.924916589209374]
We introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images. We introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.
arXiv Detail & Related papers (2022-07-26T17:34:11Z)
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z)
From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence. Research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.