From Panels to Prose: Generating Literary Narratives from Comics
- URL: http://arxiv.org/abs/2503.23344v1
- Date: Sun, 30 Mar 2025 07:18:10 GMT
- Title: From Panels to Prose: Generating Literary Narratives from Comics
- Authors: Ragav Sachdeva, Andrew Zisserman,
- Abstract summary: We develop an automated system that generates text-based literary narratives from manga comics.<n>Our approach aims to create an evocative and immersive prose that not only conveys the original narrative but also captures the depth and complexity of characters.
- Score: 55.544015596503726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comics have long been a popular form of storytelling, offering visually engaging narratives that captivate audiences worldwide. However, the visual nature of comics presents a significant barrier for visually impaired readers, limiting their access to these engaging stories. In this work, we provide a pragmatic solution to this accessibility challenge by developing an automated system that generates text-based literary narratives from manga comics. Our approach aims to create an evocative and immersive prose that not only conveys the original narrative but also captures the depth and complexity of characters, their interactions, and the vivid settings in which they reside. To this end we make the following contributions: (1) We present a unified model, Magiv3, that excels at various functional tasks pertaining to comic understanding, such as localising panels, characters, texts, and speech-bubble tails, performing OCR, grounding characters etc. (2) We release human-annotated captions for over 3300 Japanese comic panels, along with character grounding annotations, and benchmark large vision-language models in their ability to understand comic images. (3) Finally, we demonstrate how integrating large vision-language models with Magiv3, can generate seamless literary narratives that allows visually impaired audiences to engage with the depth and richness of comic storytelling.
Related papers
- One missing piece in Vision and Language: A Survey on Comics Understanding [13.766672321462435]
This survey is the first to propose a task-oriented framework for comics intelligence.<n>It aims to guide future research by addressing critical gaps in data availability and task definition.
arXiv Detail & Related papers (2024-09-14T18:26:26Z) - Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models [57.30913211264333]
We present Story3D-Agent, a pioneering approach that transforms provided narratives into 3D-rendered visualizations.
By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements.
We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
arXiv Detail & Related papers (2024-08-21T17:43:15Z) - Toward accessible comics for blind and low vision readers [0.059584784039407875]
We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content.
We generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc.
arXiv Detail & Related papers (2024-07-11T07:50:25Z) - Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion [35.25298023240529]
We propose a novel zero-shot approach to identify characters and predict speaker names based solely on unannotated comic images.
Our method requires no training data or annotations, it can be used as-is on any comic series.
arXiv Detail & Related papers (2024-04-22T08:59:35Z) - The Manga Whisperer: Automatically Generating Transcriptions for Comics [55.544015596503726]
We present a unified model, Magi, that is able to detect panels, text boxes and character boxes.
We propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript.
arXiv Detail & Related papers (2024-01-18T18:59:09Z) - Envisioning Narrative Intelligence: A Creative Visual Storytelling
Anthology [7.962160810367763]
We present five themes that characterize the variations found in this creative visual storytelling process.
We envision narrative intelligence criteria for computational visual storytelling as: creative, reliable, expressive, grounded, and responsible.
arXiv Detail & Related papers (2023-10-06T18:47:20Z) - Comics for Everyone: Generating Accessible Text Descriptions for Comic
Strips [0.0]
We create natural language descriptions of comic strips that are accessible to the visually impaired community.
Our method consists of two steps: first, we use computer vision techniques to extract information about the panels, characters, and text of the comic images.
We test our method on a collection of comics that have been annotated by human experts and measure its performance using both quantitative and qualitative metrics.
arXiv Detail & Related papers (2023-10-01T15:13:48Z) - Dense Multitask Learning to Reconfigure Comics [63.367664789203936]
We develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels.
Our method can successfully identify the semantic units as well as the notion of 3D in comic panels.
arXiv Detail & Related papers (2023-07-16T15:10:34Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - Automatic Comic Generation with Stylistic Multi-page Layouts and
Emotion-driven Text Balloon Generation [57.10363557465713]
We propose a fully automatic system for generating comic books from videos without any human intervention.
Given an input video along with its subtitles, our approach first extracts informatives by analyzing the subtitles.
Then, we propose a novel automatic multi-page framework layout, which can allocate the images across multiple pages.
arXiv Detail & Related papers (2021-01-26T22:15:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.