Causal-Story: Local Causal Attention Utilizing Parameter-Efficient
Tuning For Visual Story Synthesis
- URL: http://arxiv.org/abs/2309.09553v4
- Date: Wed, 6 Mar 2024 16:16:19 GMT
- Title: Causal-Story: Local Causal Attention Utilizing Parameter-Efficient
Tuning For Visual Story Synthesis
- Authors: Tianyi Song, Jiuxin Cao, Kun Wang, Bo Liu, Xiaofeng Zhang
- Abstract summary: We propose Causal-Story, which considers the causal relationship between previous captions, frames, and current captions.
We evaluate our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores.
- Score: 12.766712398098646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The excellent text-to-image synthesis capability of diffusion models has
driven progress in synthesizing coherent visual stories. The current
state-of-the-art method combines the features of historical captions,
historical frames, and the current captions as conditions for generating the
current frame. However, this method treats each historical frame and caption as
the same contribution. It connects them in order with equal weights, ignoring
that not all historical conditions are associated with the generation of the
current frame. To address this issue, we propose Causal-Story. This model
incorporates a local causal attention mechanism that considers the causal
relationship between previous captions, frames, and current captions. By
assigning weights based on this relationship, Causal-Story generates the
current frame, thereby improving the global consistency of story generation. We
evaluated our model on the PororoSV and FlintstonesSV datasets and obtained
state-of-the-art FID scores, and the generated frames also demonstrate better
storytelling in visuals.
Related papers
- ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
Existing autoregressive methods struggle with high memory usage, slow generation speeds, and limited context integration.
We propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation.
In experiments on PororoSV and FlintstonesSV benchmarks, ContextualStory significantly outperforms existing methods in both story visualization and story continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z) - Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models [12.907590808274358]
We propose a novel Rich-contextual Diffusion Models (RCDMs) to enhance story generation's semantic consistency and temporal consistency.
RCDMs can generate consistent stories with a single forward inference compared to autoregressive models.
arXiv Detail & Related papers (2024-07-02T17:58:07Z) - StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion [78.1014542102578]
Story visualization aims to generate realistic and coherent images based on a storyline.
Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner.
We propose a bidirectional, unified, and efficient framework, namely StoryImager.
arXiv Detail & Related papers (2024-04-09T03:22:36Z) - Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning.
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z) - Improved Visual Story Generation with Adaptive Context Modeling [39.04249009170821]
We present a simple method that improves the leading system with adaptive context modeling.
We evaluate our model on PororoSV and FlintstonesSV datasets and show that our approach achieves state-of-the-art FID scores on both story visualization and continuation scenarios.
arXiv Detail & Related papers (2023-05-26T10:43:42Z) - Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z) - StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story
Continuation [76.44802273236081]
We develop a model StoryDALL-E for story continuation, where the generated visual story is conditioned on a source image.
We show that our retro-fitting approach outperforms GAN-based models for story continuation and facilitates copying of visual elements from the source image.
Overall, our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.
arXiv Detail & Related papers (2022-09-13T17:47:39Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.