ComicGAN: Text-to-Comic Generative Adversarial Network
- URL: http://arxiv.org/abs/2109.09120v1
- Date: Sun, 19 Sep 2021 13:31:32 GMT
- Title: ComicGAN: Text-to-Comic Generative Adversarial Network
- Authors: Ben Proven-Bessel, Zilong Zhao, Lydia Chen
- Abstract summary: We implement ComicGAN, a novel text-to-image GAN that synthesizes comics according to text descriptions.
We extensively evaluate the proposed ComicGAN in two scenarios, namely image generation from descriptions, and image generation from dialogue.
- Score: 1.4824891788575418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drawing and annotating comic illustrations is a complex and difficult
process. No existing machine learning algorithms have been developed to create
comic illustrations based on descriptions of illustrations, or the dialogue in
comics. Moreover, it is not known if a generative adversarial network (GAN) can
generate original comics that correspond to the dialogue and/or descriptions.
GANs are successful in producing photo-realistic images, but this technology
does not necessarily translate to generation of flawless comics. What is more,
comic evaluation is a prominent challenge as common metrics such as Inception
Score will not perform comparably, as they are designed to work on photos. In
this paper: 1. We implement ComicGAN, a novel text-to-comic pipeline based on a
text-to-image GAN that synthesizes comics according to text descriptions. 2. We
describe an in-depth empirical study of the technical difficulties of comic
generation using GAN's. ComicGAN has two novel features: (i) text description
creation from labels via permutation and augmentation, and (ii) custom image
encoding with Convolutional Neural Networks. We extensively evaluate the
proposed ComicGAN in two scenarios, namely image generation from descriptions,
and image generation from dialogue. Our results on 1000 Dilbert comic panels
and 6000 descriptions show synthetic comic panels from text inputs resemble
original Dilbert panels. Novel methods for text description creation and custom
image encoding brought improvements to Frechet Inception Distance, detail, and
overall image quality over baseline algorithms. Generating illustrations from
descriptions provided clear comics including characters and colours that were
specified in the descriptions.
Related papers
- Collaborative Comic Generation: Integrating Visual Narrative Theories with AI Models for Enhanced Creativity [1.1181151748260076]
This study presents a theory-inspired visual narrative generative system that integrates conceptual principles-comic authoring idioms-with generative and language models to enhance the comic creation process.
Key contributions include integrating machine learning models into the human-AI cooperative comic generation process, deploying abstract narrative theories into AI-driven comic creation, and a customizable tool for narrative-driven image sequences.
arXiv Detail & Related papers (2024-09-25T18:21:01Z) - Multimodal Transformer for Comics Text-Cloze [8.616858272810084]
Text-cloze refers to the task of selecting the correct text to use in a comic panel, given its neighboring panels.
Traditional methods based on recurrent neural networks have struggled with this task due to limited OCR accuracy and inherent model limitations.
We introduce a novel Multimodal Large Language Model (Multimodal-LLM) architecture, specifically designed for Text-cloze, achieving a 10% improvement over existing state-of-the-art models in both its easy and hard variants.
arXiv Detail & Related papers (2024-03-06T14:11:45Z) - Comics for Everyone: Generating Accessible Text Descriptions for Comic
Strips [0.0]
We create natural language descriptions of comic strips that are accessible to the visually impaired community.
Our method consists of two steps: first, we use computer vision techniques to extract information about the panels, characters, and text of the comic images.
We test our method on a collection of comics that have been annotated by human experts and measure its performance using both quantitative and qualitative metrics.
arXiv Detail & Related papers (2023-10-01T15:13:48Z) - Dense Multitask Learning to Reconfigure Comics [63.367664789203936]
We develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels.
Our method can successfully identify the semantic units as well as the notion of 3D in comic panels.
arXiv Detail & Related papers (2023-07-16T15:10:34Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - AI Illustrator: Translating Raw Descriptions into Images by Prompt-based
Cross-Modal Generation [61.77946020543875]
We propose a framework for translating raw descriptions with complex semantics into semantically corresponding images.
Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN.
Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training.
arXiv Detail & Related papers (2022-09-07T13:53:54Z) - DT2I: Dense Text-to-Image Generation from Region Descriptions [3.883984493622102]
We introduce dense text-to-image (DT2I) synthesis as a new task to pave the way toward more intuitive image generation.
We also propose DTC-GAN, a novel method to generate images from semantically rich region descriptions.
arXiv Detail & Related papers (2022-04-05T07:57:11Z) - Automatic Comic Generation with Stylistic Multi-page Layouts and
Emotion-driven Text Balloon Generation [57.10363557465713]
We propose a fully automatic system for generating comic books from videos without any human intervention.
Given an input video along with its subtitles, our approach first extracts informatives by analyzing the subtitles.
Then, we propose a novel automatic multi-page framework layout, which can allocate the images across multiple pages.
arXiv Detail & Related papers (2021-01-26T22:15:15Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z) - Text-Guided Neural Image Inpainting [20.551488941041256]
Inpainting task requires filling the corrupted image with contents coherent with the context.
The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text.
We propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet)
arXiv Detail & Related papers (2020-04-07T09:04:43Z) - Structural-analogy from a Single Image Pair [118.61885732829117]
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
We generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A.
Our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only.
arXiv Detail & Related papers (2020-04-05T14:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.