A Customizable Generator for Comic-Style Visual Narrative
- URL: http://arxiv.org/abs/2401.02863v1
- Date: Thu, 14 Dec 2023 03:46:30 GMT
- Title: A Customizable Generator for Comic-Style Visual Narrative
- Authors: Yi-Chun Chen, Arnav Jhala
- Abstract summary: We present a theory-inspired visual narrative generator that incorporates comic-authoring idioms.
The generator creates comics through sequential decision-making across layers from panel composition, object positions, panel transitions, and narrative elements.
- Score: 1.320904960556043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a theory-inspired visual narrative generator that incorporates
comic-authoring idioms, which transfers the conceptual principles of comics
into system layers that integrate the theories to create comic content. The
generator creates comics through sequential decision-making across layers from
panel composition, object positions, panel transitions, and narrative elements.
Each layer's decisions are based on narrative goals and follow the respective
layer idioms of the medium. Cohn's narrative grammar provides the overall story
arc. Photographic compositions inspired by the rule of thirds is used to
provide panel compositions. McCloud's proposed panel transitions based on focus
shifts between scene, character, and temporal changes are encoded in the
transition layer. Finally, common overlay symbols (such as the exclamation) are
added based on analyzing action verbs using an action-verb ontology. We
demonstrate the variety of generated comics through various settings with
example outputs. The generator and associated modules could be a useful system
for visual narrative authoring and for further research into computational
models of visual narrative understanding.
Related papers
- Collaborative Comic Generation: Integrating Visual Narrative Theories with AI Models for Enhanced Creativity [1.1181151748260076]
This study presents a theory-inspired visual narrative generative system that integrates conceptual principles-comic authoring idioms-with generative and language models to enhance the comic creation process.
Key contributions include integrating machine learning models into the human-AI cooperative comic generation process, deploying abstract narrative theories into AI-driven comic creation, and a customizable tool for narrative-driven image sequences.
arXiv Detail & Related papers (2024-09-25T18:21:01Z) - One missing piece in Vision and Language: A Survey on Comics Understanding [13.766672321462435]
This survey is the first to propose a task-oriented framework for comics intelligence.
It aims to guide future research by addressing critical gaps in data availability and task definition.
arXiv Detail & Related papers (2024-09-14T18:26:26Z) - Imagining from Images with an AI Storytelling Tool [0.27309692684728604]
The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories.
The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input.
arXiv Detail & Related papers (2024-08-21T10:49:15Z) - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [117.13475564834458]
We propose a new way of self-attention calculation, termed Consistent Self-Attention.
To extend our method to long-range video generation, we introduce a novel semantic space temporal motion prediction module.
By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos.
arXiv Detail & Related papers (2024-05-02T16:25:16Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - CPST: Comprehension-Preserving Style Transfer for Multi-Modal Narratives [1.320904960556043]
Among static visual narratives such as comics and manga, there are distinct visual styles in terms of presentation.
The layout of both text and media elements is also significant in terms of narrative communication.
We introduce the notion of comprehension-preserving style transfer (CPST) in such multi-modal domains.
arXiv Detail & Related papers (2023-12-14T07:26:18Z) - Make-A-Storyboard: A General Framework for Storyboard with Disentangled
and Merged Control [131.1446077627191]
We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making.
Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters.
Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
arXiv Detail & Related papers (2023-12-06T12:16:23Z) - TextPSG: Panoptic Scene Graph Generation from Textual Descriptions [78.1140391134517]
We study a new problem of Panoptic Scene Graph Generation from Purely Textual Descriptions (Caption-to-PSG)
The key idea is to leverage the large collection of free image-caption data on the Web alone to generate panoptic scene graphs.
We propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator.
arXiv Detail & Related papers (2023-10-10T22:36:15Z) - Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning.
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - ComicGAN: Text-to-Comic Generative Adversarial Network [1.4824891788575418]
We implement ComicGAN, a novel text-to-image GAN that synthesizes comics according to text descriptions.
We extensively evaluate the proposed ComicGAN in two scenarios, namely image generation from descriptions, and image generation from dialogue.
arXiv Detail & Related papers (2021-09-19T13:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.