Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs
- URL: http://arxiv.org/abs/2507.21893v2
- Date: Tue, 05 Aug 2025 14:01:48 GMT
- Title: Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs
- Authors: Saeed Ghorbani,
- Abstract summary: Aether Weaver is a novel framework for narrative co-generation that overcomes limitations of multimodal text-to-visual pipelines.<n>Our system concurrently synthesizes textual narratives, dynamic scene graph representations, visual scenes, and affective soundscapes.
- Score: 0.8702432681310401
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce Aether Weaver, a novel, integrated framework for multimodal narrative co-generation that overcomes limitations of sequential text-to-visual pipelines. Our system concurrently synthesizes textual narratives, dynamic scene graph representations, visual scenes, and affective soundscapes, driven by a tightly integrated, co-generation mechanism. At its core, the Narrator, a large language model, generates narrative text and multimodal prompts, while the Director acts as a dynamic scene graph manager, and analyzes the text to build and maintain a structured representation of the story's world, ensuring spatio-temporal and relational consistency for visual rendering and subsequent narrative generation. Additionally, a Narrative Arc Controller guides the high-level story structure, influencing multimodal affective consistency, further complemented by an Affective Tone Mapper that ensures congruent emotional expression across all modalities. Through qualitative evaluations on a diverse set of narrative prompts encompassing various genres, we demonstrate that Aether Weaver significantly enhances narrative depth, visual fidelity, and emotional resonance compared to cascaded baseline approaches. This integrated framework provides a robust platform for rapid creative prototyping and immersive storytelling experiences.
Related papers
- STORYTELLER: An Enhanced Plot-Planning Framework for Coherent and Cohesive Story Generation [17.553025200797986]
We introduce Storyteller, a novel approach that systemically improves the coherence and consistency of automatically generated stories.<n>In experiments, Storyteller significantly outperforms existing approaches, achieving an 84.33% average win rate.<n>At the same time, it is also far ahead in other aspects including creativity, coherence, engagement, and relevance.
arXiv Detail & Related papers (2025-06-03T00:54:00Z) - Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts [20.281732318265483]
We present a modular pipeline that transforms action-level prompts into visually and auditorily grounded narrative dialogue.<n>Our method takes as input a pair of prompts per scene, where the first defines the setting and the second specifies a character's behavior.<n>We render each utterance as expressive, character-consistent speech, resulting in fully-voiced video narratives.
arXiv Detail & Related papers (2025-05-22T15:54:42Z) - STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives [82.19488717416351]
This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
arXiv Detail & Related papers (2025-05-13T08:48:10Z) - Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics [1.320904960556043]
This paper presents a hierarchical knowledge graph framework for the structured understanding of visual narratives, focusing on comics.<n>It represents them through integrated knowledge graphs that capture semantic, spatial, and temporal relationships.<n>At the panel level, we construct multimodal graphs that link visual elements such as characters, objects, and actions with corresponding textual components, including dialogue and captions.
arXiv Detail & Related papers (2025-04-14T14:42:19Z) - VisAgent: Narrative-Preserving Story Visualization Framework [5.86192577938549]
VisAgent is a training-free framework designed to comprehend and visualize pivotal scenes within a given story.<n>By considering story distillation, semantic consistency, and contextual coherence, VisAgent employs an agentic workflow.<n>The empirically validated effectiveness confirms the framework's suitability for practical story visualization applications.
arXiv Detail & Related papers (2025-03-04T08:41:45Z) - Agents' Room: Narrative Generation through Multi-step Collaboration [54.98886593802834]
We propose a generation framework inspired by narrative theory that decomposes narrative writing into subtasks tackled by specialized agents.<n>We show that Agents' Room generates stories preferred by expert evaluators over those produced by baseline systems.
arXiv Detail & Related papers (2024-10-03T15:44:42Z) - Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models [57.30913211264333]
We present Story3D-Agent, a pioneering approach that transforms provided narratives into 3D-rendered visualizations.
By integrating procedural modeling, our approach enables precise control over multi-character actions and motions, as well as diverse decorative elements.
We have thoroughly evaluated our Story3D-Agent to validate its effectiveness, offering a basic framework to advance 3D story representation.
arXiv Detail & Related papers (2024-08-21T17:43:15Z) - ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
ContextualStory is a framework designed to generate coherent story frames and extend frames for visual storytelling.<n>We introduce a Storyline Contextualizer to enrich context in storyline embedding, and a StoryFlow Adapter to measure scene changes between frames.<n>Experiments on PororoSV and FlintstonesSV datasets demonstrate that ContextualStory significantly outperforms existing SOTA methods in both story visualization and continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - PlotMachines: Outline-Conditioned Generation with Dynamic Plot State
Tracking [128.76063992147016]
We present PlotMachines, a neural narrative model that learns to transform an outline into a coherent story by tracking the dynamic plot states.
In addition, we enrich PlotMachines with high-level discourse structure so that the model can learn different writing styles corresponding to different parts of the narrative.
arXiv Detail & Related papers (2020-04-30T17:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.