Related papers: ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

URL: http://arxiv.org/abs/2407.09774v2
Date: Wed, 21 Aug 2024 14:17:31 GMT
Title: ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context
Authors: Sixiao Zheng, Yanwei Fu,
Abstract summary: Existing autoregressive methods struggle with high memory usage, slow generation speeds, and limited context integration. We propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation. In experiments on PororoSV and FlintstonesSV benchmarks, ContextualStory significantly outperforms existing methods in both story visualization and story continuation.
Score: 50.572907418430155
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual storytelling involves generating a sequence of coherent frames from a textual storyline while maintaining consistency in characters and scenes. Existing autoregressive methods, which rely on previous frame-sentence pairs, struggle with high memory usage, slow generation speeds, and limited context integration. To address these issues, we propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation. ContextualStory utilizes Spatially-Enhanced Temporal Attention to capture spatial and temporal dependencies, handling significant character movements effectively. Additionally, we introduces a Storyline Contextualizer to enrich context in storyline embedding and a StoryFlow Adapter to measure scene changes between frames for guiding model. Extensive experiments on PororoSV and FlintstonesSV benchmarks demonstrate that ContextualStory significantly outperforms existing methods in both story visualization and story continuation.

Related papers

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives [82.19488717416351]
This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
arXiv Detail & Related papers (2025-05-13T08:48:10Z)
VinaBench: Benchmark for Faithful and Consistent Visual Narratives [29.111073358773698]
We propose a new benchmark, VinaBench, to address the challenge of generating faithful visual narratives. Our results demonstrate that learning with VinaBench's knowledge constraints effectively improves the faithfulness and cohesion of generated visual narratives.
arXiv Detail & Related papers (2025-03-26T18:00:03Z)
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization [14.303607837426126]
We propose a training-free and computationally efficient framework, termed Story-Adapter, to enhance the generative capability of long stories. Central to our framework is a training-free global reference cross-attention module, which aggregates all generated images from the previous iteration. Experiments validate the superiority of Story-Adapter in improving both semantic consistency and generative capability for fine-grained interactions.
arXiv Detail & Related papers (2024-10-08T17:59:30Z)
Generating Visual Stories with Grounded and Coreferent Characters [63.07511918366848]
We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
arXiv Detail & Related papers (2024-09-20T14:56:33Z)
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control [131.1446077627191]
We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making. Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters. Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
arXiv Detail & Related papers (2023-12-06T12:16:23Z)
Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis [12.766712398098646]
We propose Causal-Story, which considers the causal relationship between previous captions, frames, and current captions. We evaluate our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores.
arXiv Detail & Related papers (2023-09-18T08:06:06Z)
Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem. We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z)
Story Visualization by Online Text Augmentation with Context Memory [64.86944645907771]
We propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation. The proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision.
arXiv Detail & Related papers (2023-08-15T05:08:12Z)
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling. We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module. We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z)
Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context. Our method outperforms prior state-of-the-art in generating frames with high visual quality. Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.