Make-A-Story: Visual Memory Conditioned Consistent Story Generation
- URL: http://arxiv.org/abs/2211.13319v3
- Date: Sat, 6 May 2023 02:25:31 GMT
- Title: Make-A-Story: Visual Memory Conditioned Consistent Story Generation
- Authors: Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta
Mahajan, Leonid Sigal
- Abstract summary: We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
- Score: 57.691064030235985
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: There has been a recent explosion of impressive generative models that can
produce high quality images (or videos) conditioned on text descriptions.
However, all such approaches rely on conditional sentences that contain
unambiguous descriptions of scenes and main actors in them. Therefore employing
such models for more complex task of story visualization, where naturally
references and co-references exist, and one requires to reason about when to
maintain consistency of actors and backgrounds across frames/scenes, and when
not to, based on story progression, remains a challenge. In this work, we
address the aforementioned challenges and propose a novel autoregressive
diffusion-based framework with a visual memory module that implicitly captures
the actor and background context across the generated frames.
Sentence-conditioned soft attention over the memories enables effective
reference resolution and learns to maintain scene and actor consistency when
needed. To validate the effectiveness of our approach, we extend the MUGEN
dataset and introduce additional characters, backgrounds and referencing in
multi-sentence storylines. Our experiments for story generation on the MUGEN,
the PororoSV and the FlintstonesSV dataset show that our method not only
outperforms prior state-of-the-art in generating frames with high visual
quality, which are consistent with the story, but also models appropriate
correspondences between the characters and the background.
Related papers
- MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence [62.72540590546812]
MovieDreamer is a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering.
We present experiments across various movie genres, demonstrating that our approach achieves superior visual and narrative quality.
arXiv Detail & Related papers (2024-07-23T17:17:05Z) - ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
Existing autoregressive methods struggle with high memory usage, slow generation speeds, and limited context integration.
We propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation.
In experiments on PororoSV and FlintstonesSV benchmarks, ContextualStory significantly outperforms existing methods in both story visualization and story continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z) - StoryGPT-V: Large Language Models as Consistent Story Visualizers [39.790319429455856]
generative models have demonstrated impressive capabilities in generating realistic and visually pleasing images grounded on textual prompts.
Yet, the emerging Large Language Model (LLM) showcases robust reasoning abilities to navigate through ambiguous references.
We introduce textbfStoryGPT-V, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters.
arXiv Detail & Related papers (2023-12-04T18:14:29Z) - Causal-Story: Local Causal Attention Utilizing Parameter-Efficient
Tuning For Visual Story Synthesis [12.766712398098646]
We propose Causal-Story, which considers the causal relationship between previous captions, frames, and current captions.
We evaluate our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores.
arXiv Detail & Related papers (2023-09-18T08:06:06Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story
Continuation [76.44802273236081]
We develop a model StoryDALL-E for story continuation, where the generated visual story is conditioned on a source image.
We show that our retro-fitting approach outperforms GAN-based models for story continuation and facilitates copying of visual elements from the source image.
Overall, our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.
arXiv Detail & Related papers (2022-09-13T17:47:39Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.