SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency
- URL: http://arxiv.org/abs/2510.22994v1
- Date: Mon, 27 Oct 2025 04:19:22 GMT
- Title: SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency
- Authors: Quanjian Song, Donghao Zhou, Jingyu Lin, Fei Shen, Jiaze Wang, Xiaowei Hu, Cunjian Chen, Pheng-Ann Heng,
- Abstract summary: SceneDecorator is a training-free framework that employs VLM-Guided Scene Planning to ensure narrative coherence across different scenes in a global-to-local'' manner.<n>Extensive experiments demonstrate the superior performance of SceneDecorator, highlighting its potential to unleash creativity in the fields of arts, films, and games.
- Score: 47.20554570948312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent text-to-image models have revolutionized image generation, but they still struggle with maintaining concept consistency across generated images. While existing works focus on character consistency, they often overlook the crucial role of scenes in storytelling, which restricts their creativity in practice. This paper introduces scene-oriented story generation, addressing two key challenges: (i) scene planning, where current methods fail to ensure scene-level narrative coherence by relying solely on text descriptions, and (ii) scene consistency, which remains largely unexplored in terms of maintaining scene consistency across multiple stories. We propose SceneDecorator, a training-free framework that employs VLM-Guided Scene Planning to ensure narrative coherence across different scenes in a ``global-to-local'' manner, and Long-Term Scene-Sharing Attention to maintain long-term scene consistency and subject diversity across generated stories. Extensive experiments demonstrate the superior performance of SceneDecorator, highlighting its potential to unleash creativity in the fields of arts, films, and games.
Related papers
- Consistent text-to-image generation via scene de-contextualization [48.19924216489272]
Consistent text-to-image (T2I) generation often fails due to a phenomenon called identity (ID) shift.<n>This paper reveals that a key source of ID shift is the native correlation between subject and scene context.<n>We propose Scene De-Conization (SDeC) that imposes an inversion process of T2I's built-in scene contextualization.
arXiv Detail & Related papers (2025-10-16T10:54:49Z) - SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion [74.70024991949269]
We introduce SceneAdapt, a framework that injects scene awareness into text-conditioned motion models.<n>Key idea is to use motion inbetweening, learnable without text, as a proxy task to bridge two distinct datasets.<n>Results show that SceneAdapt effectively injects scene awareness into text-to-motion models.
arXiv Detail & Related papers (2025-10-14T23:42:10Z) - STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives [82.19488717416351]
This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
arXiv Detail & Related papers (2025-05-13T08:48:10Z) - VisAgent: Narrative-Preserving Story Visualization Framework [5.86192577938549]
VisAgent is a training-free framework designed to comprehend and visualize pivotal scenes within a given story.<n>By considering story distillation, semantic consistency, and contextual coherence, VisAgent employs an agentic workflow.<n>The empirically validated effectiveness confirms the framework's suitability for practical story visualization applications.
arXiv Detail & Related papers (2025-03-04T08:41:45Z) - ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
ContextualStory is a framework designed to generate coherent story frames and extend frames for visual storytelling.<n>We introduce a Storyline Contextualizer to enrich context in storyline embedding, and a StoryFlow Adapter to measure scene changes between frames.<n>Experiments on PororoSV and FlintstonesSV datasets demonstrate that ContextualStory significantly outperforms existing SOTA methods in both story visualization and continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z) - Make-A-Storyboard: A General Framework for Storyboard with Disentangled
and Merged Control [131.1446077627191]
We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making.
Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters.
Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
arXiv Detail & Related papers (2023-12-06T12:16:23Z) - Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.