Related papers: STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives

URL: http://arxiv.org/abs/2505.08350v2
Date: Sat, 17 May 2025 00:50:44 GMT
Title: STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
Authors: Bo Wang, Haoyang Huang, Zhiying Lu, Fengyuan Liu, Guoqing Ma, Jianlong Yuan, Yuan Zhang, Nan Duan, Daxin Jiang,
Abstract summary: This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
Score: 82.19488717416351
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames with strong temporal consistency. The framework employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency, character continuity, and smooth scene transitions throughout the narrative. Specific conditions are introduced to distinguish story frame generation from standard video synthesis, facilitating greater scene diversity and enhancing narrative richness. To further improve generation quality, StoryAnchors integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics. This approach supports the creation of editable and expandable story frames, allowing for manual modifications and the generation of longer, more complex sequences. Extensive experiments show that StoryAnchors outperforms existing open-source models in key areas such as consistency, narrative coherence, and scene diversity. Its performance in narrative consistency and story richness is also on par with GPT-4o. Ultimately, StoryAnchors pushes the boundaries of story-driven frame generation, offering a scalable, flexible, and highly editable foundation for future research.

Related papers

Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs [0.8702432681310401]
Aether Weaver is a novel framework for narrative co-generation that overcomes limitations of multimodal text-to-visual pipelines.<n>Our system concurrently synthesizes textual narratives, dynamic scene graph representations, visual scenes, and affective soundscapes.
arXiv Detail & Related papers (2025-07-29T15:01:31Z)
StoryWriter: A Multi-Agent Framework for Long Story Generation [53.80343104003837]
Long story generation remains a challenge for existing large language models.<n>We propose StoryWriter, a multi-agent story generation framework, which consists of three main modules.<n>StoryWriter significantly outperforms existing story generation baselines in both story quality and length.
arXiv Detail & Related papers (2025-06-19T16:26:58Z)
STORYTELLER: An Enhanced Plot-Planning Framework for Coherent and Cohesive Story Generation [17.553025200797986]
We introduce Storyteller, a novel approach that systemically improves the coherence and consistency of automatically generated stories.<n>In experiments, Storyteller significantly outperforms existing approaches, achieving an 84.33% average win rate.<n>At the same time, it is also far ahead in other aspects including creativity, coherence, engagement, and relevance.
arXiv Detail & Related papers (2025-06-03T00:54:00Z)
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration [88.94832383850533]
We propose a multi-agent framework designed for Customized Storytelling Video Generation (CSVG) StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.
arXiv Detail & Related papers (2024-11-07T18:00:33Z)
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence [62.72540590546812]
MovieDreamer is a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering. We present experiments across various movie genres, demonstrating that our approach achieves superior visual and narrative quality.
arXiv Detail & Related papers (2024-07-23T17:17:05Z)
ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
ContextualStory is a framework designed to generate coherent story frames and extend frames for visual storytelling.<n>We introduce a Storyline Contextualizer to enrich context in storyline embedding, and a StoryFlow Adapter to measure scene changes between frames.<n>Experiments on PororoSV and FlintstonesSV datasets demonstrate that ContextualStory significantly outperforms existing SOTA methods in both story visualization and continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z)
Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context. Our method outperforms prior state-of-the-art in generating frames with high visual quality. Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z)
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning [12.264880519328353]
We introduce Commonsense-inference Augmented neural StoryTelling (CAST), a framework for introducing commonsense reasoning into the generation process. We find that our CAST method produces significantly more coherent, on-topic, enjoyable and fluent stories than existing models in both the single-character and two-character settings.
arXiv Detail & Related papers (2021-05-04T06:40:33Z)
Consistency and Coherency Enhanced Story Generation [35.08911595854691]
We propose a two-stage generation framework to enhance consistency and coherency of generated stories. The first stage is to organize the story outline which depicts the story plots and events, and the second stage is to expand the outline into a complete story. In addition, coreference supervision signals are incorporated to reduce coreference errors and improve the coreference consistency.
arXiv Detail & Related papers (2020-10-17T16:40:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.