A Customizable Generator for Comic-Style Visual Narrative
- URL: http://arxiv.org/abs/2401.02863v1
- Date: Thu, 14 Dec 2023 03:46:30 GMT
- Title: A Customizable Generator for Comic-Style Visual Narrative
- Authors: Yi-Chun Chen, Arnav Jhala
- Abstract summary: We present a theory-inspired visual narrative generator that incorporates comic-authoring idioms.
The generator creates comics through sequential decision-making across layers from panel composition, object positions, panel transitions, and narrative elements.
- Score: 1.320904960556043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a theory-inspired visual narrative generator that incorporates
comic-authoring idioms, which transfers the conceptual principles of comics
into system layers that integrate the theories to create comic content. The
generator creates comics through sequential decision-making across layers from
panel composition, object positions, panel transitions, and narrative elements.
Each layer's decisions are based on narrative goals and follow the respective
layer idioms of the medium. Cohn's narrative grammar provides the overall story
arc. Photographic compositions inspired by the rule of thirds is used to
provide panel compositions. McCloud's proposed panel transitions based on focus
shifts between scene, character, and temporal changes are encoded in the
transition layer. Finally, common overlay symbols (such as the exclamation) are
added based on analyzing action verbs using an action-verb ontology. We
demonstrate the variety of generated comics through various settings with
example outputs. The generator and associated modules could be a useful system
for visual narrative authoring and for further research into computational
models of visual narrative understanding.
Related papers
- StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [117.13475564834458]
We propose a new way of self-attention calculation, termed Consistent Self-Attention.
To extend our method to long-range video generation, we introduce a novel semantic space temporal motion prediction module.
By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos.
arXiv Detail & Related papers (2024-05-02T16:25:16Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - SCO-VIST: Social Interaction Commonsense Knowledge-based Visual
Storytelling [12.560014305032437]
This paper introduces SCO-VIST, a framework representing the image sequence as a graph with objects and relations.
SCO-VIST then takes this graph representing plot points and creates bridges between plot points with semantic and occurrence-based edge weights.
This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm.
arXiv Detail & Related papers (2024-02-01T04:09:17Z) - MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual
Storytelling via Multi-Layered Semantic-Aware Denoising [42.20750912837316]
MagicScroll is a progressive diffusion-based image generation framework with a novel semantic-aware denoising process.
It enables fine-grained control over the generated image on object, scene, and background levels with text, image, and layout conditions.
It showcases promising results in aligning with the narrative text, improving visual coherence, and engaging the audience.
arXiv Detail & Related papers (2023-12-18T03:09:05Z) - CPST: Comprehension-Preserving Style Transfer for Multi-Modal Narratives [1.320904960556043]
Among static visual narratives such as comics and manga, there are distinct visual styles in terms of presentation.
The layout of both text and media elements is also significant in terms of narrative communication.
We introduce the notion of comprehension-preserving style transfer (CPST) in such multi-modal domains.
arXiv Detail & Related papers (2023-12-14T07:26:18Z) - Make-A-Storyboard: A General Framework for Storyboard with Disentangled
and Merged Control [131.1446077627191]
We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making.
Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters.
Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
arXiv Detail & Related papers (2023-12-06T12:16:23Z) - TextPSG: Panoptic Scene Graph Generation from Textual Descriptions [78.1140391134517]
We study a new problem of Panoptic Scene Graph Generation from Purely Textual Descriptions (Caption-to-PSG)
The key idea is to leverage the large collection of free image-caption data on the Web alone to generate panoptic scene graphs.
We propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator.
arXiv Detail & Related papers (2023-10-10T22:36:15Z) - Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning.
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z) - Visual Story Generation Based on Emotion and Keywords [5.3860505447668015]
This work proposes a story generation pipeline to co-create visual stories with the users.
The pipeline includes two parts: narrative and image generation.
arXiv Detail & Related papers (2023-01-07T03:56:49Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - ComicGAN: Text-to-Comic Generative Adversarial Network [1.4824891788575418]
We implement ComicGAN, a novel text-to-image GAN that synthesizes comics according to text descriptions.
We extensively evaluate the proposed ComicGAN in two scenarios, namely image generation from descriptions, and image generation from dialogue.
arXiv Detail & Related papers (2021-09-19T13:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.