Related papers: Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

URL: http://arxiv.org/abs/2508.09983v1
Date: Wed, 13 Aug 2025 17:56:26 GMT
Title: Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Authors: David Dinkevich, Matan Levy, Omri Avrahami, Dvir Samuel, Dani Lischinski,
Abstract summary: We present Story2Board, a training-free framework for expressive storyboard generation from natural language.<n>To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring and Reciprocal Attention Value Mixing.<n>Our qualitative and quantitative results, as well as a user study, show that Story2Board produces more dynamic, coherent, and narratively engaging storyboards than existing baselines.
Score: 22.951592048825763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Story2Board, a training-free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing. To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring, which preserves a shared character reference across panels, and Reciprocal Attention Value Mixing, which softly blends visual features between token pairs with strong reciprocal attention. Together, these mechanisms enhance coherence without architectural changes or fine-tuning, enabling state-of-the-art diffusion models to generate visually diverse yet consistent storyboards. To structure generation, we use an off-the-shelf language model to convert free-form stories into grounded panel-level prompts. To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domain narratives designed to assess layout diversity and background-grounded storytelling, in addition to consistency. We also introduce a new Scene Diversity metric that quantifies spatial and pose variation across storyboards. Our qualitative and quantitative results, as well as a user study, show that Story2Board produces more dynamic, coherent, and narratively engaging storyboards than existing baselines.

Related papers

STAGE: Storyboard-Anchored Generation for Cinematic Multi-shot Narrative [55.05324155854762]
We introduce a SToryboard-Anchored GEneration workflow to reformulate the STAGE-based video generation task.<n>Instead of using sparses, we propose STEP2 to predict a structural storyboard composed of start-end frame pairs for each shot.<n>We also contribute the large-scale ConStoryBoard dataset, including high-quality movie clips with fine-grained narratives for story progression, cinematic attributes, and human preferences.
arXiv Detail & Related papers (2025-12-13T15:57:29Z)
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization [23.274981415638837]
ViStoryBench is a comprehensive benchmark designed to evaluate story visualization models across diverse narrative structures, visual styles, and character settings.<n>The benchmark features richly annotated multi-shot scripts derived from curated stories spanning literature, film, and folklore.<n>To enable thorough evaluation, ViStoryBench introduces a set of automated metrics that assess character consistency, style similarity, prompt adherence, aesthetic quality, and generation artifacts.
arXiv Detail & Related papers (2025-05-30T17:58:21Z)
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives [82.19488717416351]
This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
arXiv Detail & Related papers (2025-05-13T08:48:10Z)
Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics [1.320904960556043]
This paper presents a hierarchical knowledge graph framework for the structured understanding of visual narratives, focusing on comics.<n>It represents them through integrated knowledge graphs that capture semantic, spatial, and temporal relationships.<n>At the panel level, we construct multimodal graphs that link visual elements such as characters, objects, and actions with corresponding textual components, including dialogue and captions.
arXiv Detail & Related papers (2025-04-14T14:42:19Z)
Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench) MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z)
TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling [14.15543866199545]
As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically. We propose a novel method, Topic Aware Reinforcement Network for VIsual StoryTelling (TARN-VIST) In particular, we pre-extracted the topic information of stories from both visual and linguistic perspectives.
arXiv Detail & Related papers (2024-03-18T08:01:23Z)
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control [131.1446077627191]
We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making. Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters. Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
arXiv Detail & Related papers (2023-12-06T12:16:23Z)
TaleCrafter: Interactive Story Visualization with Multiple Characters [49.14122401339003]
This paper proposes a system for generic interactive story visualization. It is capable of handling multiple novel characters and supporting the editing of layout and local structure. The system comprises four interconnected components: story-to-prompt generation (S2P), text-to-generation (T2L), controllable text-to-image generation (C-T2I) and image-to-video animation (I2V)
arXiv Detail & Related papers (2023-05-29T17:11:39Z)
Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context. Our method outperforms prior state-of-the-art in generating frames with high visual quality. Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z)
Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input. Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story. Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z)
PlotThread: Creating Expressive Storyline Visualizations using Reinforcement Learning [27.129882090324422]
We propose a reinforcement learning framework to train an AI agent that assists users in exploring the design space efficiently and generating well-optimized storylines. Based on the framework, we introduce PlotThread, an authoring tool that integrates a set of flexible interactions to support easy customization of storyline visualizations.
arXiv Detail & Related papers (2020-09-01T06:01:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.