Related papers: A Customizable Generator for Comic-Style Visual Narrative

A Customizable Generator for Comic-Style Visual Narrative

URL: http://arxiv.org/abs/2401.02863v1
Date: Thu, 14 Dec 2023 03:46:30 GMT
Title: A Customizable Generator for Comic-Style Visual Narrative
Authors: Yi-Chun Chen, Arnav Jhala
Abstract summary: We present a theory-inspired visual narrative generator that incorporates comic-authoring idioms. The generator creates comics through sequential decision-making across layers from panel composition, object positions, panel transitions, and narrative elements.
Score: 1.320904960556043
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a theory-inspired visual narrative generator that incorporates comic-authoring idioms, which transfers the conceptual principles of comics into system layers that integrate the theories to create comic content. The generator creates comics through sequential decision-making across layers from panel composition, object positions, panel transitions, and narrative elements. Each layer's decisions are based on narrative goals and follow the respective layer idioms of the medium. Cohn's narrative grammar provides the overall story arc. Photographic compositions inspired by the rule of thirds is used to provide panel compositions. McCloud's proposed panel transitions based on focus shifts between scene, character, and temporal changes are encoded in the transition layer. Finally, common overlay symbols (such as the exclamation) are added based on analyzing action verbs using an action-verb ontology. We demonstrate the variety of generated comics through various settings with example outputs. The generator and associated modules could be a useful system for visual narrative authoring and for further research into computational models of visual narrative understanding.

Related papers

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives [82.19488717416351]
This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
arXiv Detail & Related papers (2025-05-13T08:48:10Z)
Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics [1.320904960556043]
This paper presents a hierarchical knowledge graph framework for the structured understanding of visual narratives, focusing on comics.<n>It represents them through integrated knowledge graphs that capture semantic, spatial, and temporal relationships.<n>At the panel level, we construct multimodal graphs that link visual elements such as characters, objects, and actions with corresponding textual components, including dialogue and captions.
arXiv Detail & Related papers (2025-04-14T14:42:19Z)
From Panels to Prose: Generating Literary Narratives from Comics [55.544015596503726]
We develop an automated system that generates text-based literary narratives from manga comics. Our approach aims to create an evocative and immersive prose that not only conveys the original narrative but also captures the depth and complexity of characters.
arXiv Detail & Related papers (2025-03-30T07:18:10Z)
AnyTop: Character Animation Diffusion with Any Topology [54.07731933876742]
We introduce AnyTop, a diffusion model that generates motions for diverse characters with distinct motion dynamics. Our work features a transformer-based denoising network, tailored for arbitrary skeleton learning. Our evaluation demonstrates that AnyTops well, even with as few as three training examples per topology, and can produce motions for unseen skeletons as well.
arXiv Detail & Related papers (2025-02-24T17:00:36Z)
Collaborative Comic Generation: Integrating Visual Narrative Theories with AI Models for Enhanced Creativity [1.1181151748260076]
This study presents a theory-inspired visual narrative generative system that integrates conceptual principles-comic authoring idioms-with generative and language models to enhance the comic creation process. Key contributions include integrating machine learning models into the human-AI cooperative comic generation process, deploying abstract narrative theories into AI-driven comic creation, and a customizable tool for narrative-driven image sequences.
arXiv Detail & Related papers (2024-09-25T18:21:01Z)
One missing piece in Vision and Language: A Survey on Comics Understanding [13.766672321462435]
This survey is the first to propose a task-oriented framework for comics intelligence. It aims to guide future research by addressing critical gaps in data availability and task definition.
arXiv Detail & Related papers (2024-09-14T18:26:26Z)
Imagining from Images with an AI Storytelling Tool [0.27309692684728604]
The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories. The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input.
arXiv Detail & Related papers (2024-08-21T10:49:15Z)
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [117.13475564834458]
We propose a new way of self-attention calculation, termed Consistent Self-Attention. To extend our method to long-range video generation, we introduce a novel semantic space temporal motion prediction module. By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos.
arXiv Detail & Related papers (2024-05-02T16:25:16Z)
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography" It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts. Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z)
CPST: Comprehension-Preserving Style Transfer for Multi-Modal Narratives [1.320904960556043]
Among static visual narratives such as comics and manga, there are distinct visual styles in terms of presentation. The layout of both text and media elements is also significant in terms of narrative communication. We introduce the notion of comprehension-preserving style transfer (CPST) in such multi-modal domains.
arXiv Detail & Related papers (2023-12-14T07:26:18Z)
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control [131.1446077627191]
We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making. Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters. Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
arXiv Detail & Related papers (2023-12-06T12:16:23Z)
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions [78.1140391134517]
We study a new problem of Panoptic Scene Graph Generation from Purely Textual Descriptions (Caption-to-PSG) The key idea is to leverage the large collection of free image-caption data on the Web alone to generate panoptic scene graphs. We propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator.
arXiv Detail & Related papers (2023-10-10T22:36:15Z)
Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning. Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret. It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z)
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z)
ComicGAN: Text-to-Comic Generative Adversarial Network [1.4824891788575418]
We implement ComicGAN, a novel text-to-image GAN that synthesizes comics according to text descriptions. We extensively evaluate the proposed ComicGAN in two scenarios, namely image generation from descriptions, and image generation from dialogue.
arXiv Detail & Related papers (2021-09-19T13:31:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.