Automatic Comic Generation with Stylistic Multi-page Layouts and
Emotion-driven Text Balloon Generation
- URL: http://arxiv.org/abs/2101.11111v1
- Date: Tue, 26 Jan 2021 22:15:15 GMT
- Title: Automatic Comic Generation with Stylistic Multi-page Layouts and
Emotion-driven Text Balloon Generation
- Authors: Xin Yang, Zongliang Ma, Letian Yu, Ying Cao, Baocai Yin, Xiaopeng Wei,
Qiang Zhang, Rynson W.H. Lau
- Abstract summary: We propose a fully automatic system for generating comic books from videos without any human intervention.
Given an input video along with its subtitles, our approach first extracts informatives by analyzing the subtitles.
Then, we propose a novel automatic multi-page framework layout, which can allocate the images across multiple pages.
- Score: 57.10363557465713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a fully automatic system for generating comic books
from videos without any human intervention. Given an input video along with its
subtitles, our approach first extracts informative keyframes by analyzing the
subtitles, and stylizes keyframes into comic-style images. Then, we propose a
novel automatic multi-page layout framework, which can allocate the images
across multiple pages and synthesize visually interesting layouts based on the
rich semantics of the images (e.g., importance and inter-image relation).
Finally, as opposed to using the same type of balloon as in previous works, we
propose an emotion-aware balloon generation method to create different types of
word balloons by analyzing the emotion of subtitles and audios. Our method is
able to vary balloon shapes and word sizes in balloons in response to different
emotions, leading to more enriched reading experience. Once the balloons are
generated, they are placed adjacent to their corresponding speakers via speaker
detection. Our results show that our method, without requiring any user inputs,
can generate high-quality comic pages with visually rich layouts and balloons.
Our user studies also demonstrate that users prefer our generated results over
those by state-of-the-art comic generation systems.
Related papers
- Imagining from Images with an AI Storytelling Tool [0.27309692684728604]
The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories.
The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input.
arXiv Detail & Related papers (2024-08-21T10:49:15Z) - Toward accessible comics for blind and low vision readers [0.059584784039407875]
We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content.
We generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc.
arXiv Detail & Related papers (2024-07-11T07:50:25Z) - MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual
Storytelling via Multi-Layered Semantic-Aware Denoising [42.20750912837316]
MagicScroll is a progressive diffusion-based image generation framework with a novel semantic-aware denoising process.
It enables fine-grained control over the generated image on object, scene, and background levels with text, image, and layout conditions.
It showcases promising results in aligning with the narrative text, improving visual coherence, and engaging the audience.
arXiv Detail & Related papers (2023-12-18T03:09:05Z) - Comics for Everyone: Generating Accessible Text Descriptions for Comic
Strips [0.0]
We create natural language descriptions of comic strips that are accessible to the visually impaired community.
Our method consists of two steps: first, we use computer vision techniques to extract information about the panels, characters, and text of the comic images.
We test our method on a collection of comics that have been annotated by human experts and measure its performance using both quantitative and qualitative metrics.
arXiv Detail & Related papers (2023-10-01T15:13:48Z) - MovieFactory: Automatic Movie Creation from Text using Large Generative
Models for Language and Images [92.13079696503803]
We present MovieFactory, a framework to generate cinematic-picture (3072$times$1280), film-style (multi-scene), and multi-modality (sounding) movies.
Our approach empowers users to create captivating movies with smooth transitions using simple text inputs.
arXiv Detail & Related papers (2023-06-12T17:31:23Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - VScript: Controllable Script Generation with Audio-Visual Presentation [56.17400243061659]
VScript is a controllable pipeline that generates complete scripts including dialogues and scene descriptions.
We adopt a hierarchical structure, which generates the plot, then the script and its audio-visual presentation.
Experiment results show that our approach outperforms the baselines on both automatic and human evaluations.
arXiv Detail & Related papers (2022-03-01T09:43:02Z) - ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer [59.05857591535986]
We propose a model called ViNTER to generate image narratives that focus on time series representing varying emotions as "emotion arcs"
We present experimental results of both manual and automatic evaluations.
arXiv Detail & Related papers (2022-02-15T10:53:08Z) - Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for
Stylized Image Captioning [3.0415487485299373]
Stylized image captioning systems aim to generate a caption consistent with a given style description.
Many studies focus on unsupervised approaches, without considering from the perspective of data augmentation.
We propose a novel Extract-Retrieve-Generate data augmentation framework to extract style phrases from small-scale stylized sentences.
arXiv Detail & Related papers (2021-08-26T17:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.