Improved Visual Story Generation with Adaptive Context Modeling
- URL: http://arxiv.org/abs/2305.16811v1
- Date: Fri, 26 May 2023 10:43:42 GMT
- Title: Improved Visual Story Generation with Adaptive Context Modeling
- Authors: Zhangyin Feng, Yuchen Ren, Xinmiao Yu, Xiaocheng Feng, Duyu Tang,
Shuming Shi, Bing Qin
- Abstract summary: We present a simple method that improves the leading system with adaptive context modeling.
We evaluate our model on PororoSV and FlintstonesSV datasets and show that our approach achieves state-of-the-art FID scores on both story visualization and continuation scenarios.
- Score: 39.04249009170821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models developed on top of powerful text-to-image generation models
like Stable Diffusion achieve remarkable success in visual story generation.
However, the best-performing approach considers historically generated results
as flattened memory cells, ignoring the fact that not all preceding images
contribute equally to the generation of the characters and scenes at the
current stage. To address this, we present a simple method that improves the
leading system with adaptive context modeling, which is not only incorporated
in the encoder but also adopted as additional guidance in the sampling stage to
boost the global consistency of the generated story. We evaluate our model on
PororoSV and FlintstonesSV datasets and show that our approach achieves
state-of-the-art FID scores on both story visualization and continuation
scenarios. We conduct detailed model analysis and show that our model excels at
generating semantically consistent images for stories.
Related papers
- StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion [78.1014542102578]
Story visualization aims to generate realistic and coherent images based on a storyline.
Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner.
We propose a bidirectional, unified, and efficient framework, namely StoryImager.
arXiv Detail & Related papers (2024-04-09T03:22:36Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - Zero-shot Generation of Coherent Storybook from Plain Text Story using
Diffusion Models [43.32978092618245]
We present a novel neural pipeline for generating a coherent storybook from the plain text of a story.
We leverage a combination of a pre-trained Large Language Model and a text-guided Latent Diffusion Model to generate coherent images.
arXiv Detail & Related papers (2023-02-08T06:24:06Z) - Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z) - StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story
Continuation [76.44802273236081]
We develop a model StoryDALL-E for story continuation, where the generated visual story is conditioned on a source image.
We show that our retro-fitting approach outperforms GAN-based models for story continuation and facilitates copying of visual elements from the source image.
Overall, our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.
arXiv Detail & Related papers (2022-09-13T17:47:39Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.