Related papers: Beyond Direct Generation: A Decomposed Approach to Well-Crafted Screenwriting with LLMs

Beyond Direct Generation: A Decomposed Approach to Well-Crafted Screenwriting with LLMs

URL: http://arxiv.org/abs/2510.23163v1
Date: Mon, 27 Oct 2025 09:41:29 GMT
Title: Beyond Direct Generation: A Decomposed Approach to Well-Crafted Screenwriting with LLMs
Authors: Hang Lei, Shengyi Zong, Zhaoyan Li, Ziren Zhou, Hao Liu,
Abstract summary: Large Language Models (LLMs) show great potential in creative writing.<n>Direct end-to-end generation approaches often fail to produce well-crafted screenplays.<n>We introduce Dual-Stage Refinement (DSR), a framework that decouples creative narrative generation from format conversion.
Score: 6.802263659531867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The screenplay serves as the foundation for television production, defining narrative structure, character development, and dialogue. While Large Language Models (LLMs) show great potential in creative writing, direct end-to-end generation approaches often fail to produce well-crafted screenplays. We argue this failure stems from forcing a single model to simultaneously master two disparate capabilities: creative narrative construction and rigid format adherence. The resulting outputs may mimic superficial style but lack the deep structural integrity and storytelling substance required for professional use. To enable LLMs to generate high-quality screenplays, we introduce Dual-Stage Refinement (DSR), a decomposed framework that decouples creative narrative generation from format conversion. The first stage transforms a brief outline into rich, novel-style prose. The second stage refines this narrative into a professionally formatted screenplay. This separation enables the model to specialize in one distinct capability at each stage. A key challenge in implementing DSR is the scarcity of paired outline-to-novel training data. We address this through hybrid data synthesis: reverse synthesis deconstructs existing screenplays into structured inputs, while forward synthesis leverages these inputs to generate high-quality narrative texts as training targets. Blind evaluations by professional screenwriters show that DSR achieves a 75% win rate against strong baselines like Gemini-2.5-Pro and reaches 82.7% of human-level performance. Our work demonstrates that decomposed generation architecture with tailored data synthesis effectively specializes LLMs in complex creative domains.

Related papers

Bridging Your Imagination with Audio-Video Generation via a Unified Director [54.45375287950375]
We argue that logical reasoning and imaginative thinking are both fundamental qualities of a film director.<n>We propose UniMAGE, a unified director model that bridges user prompts with well-structured scripts.
arXiv Detail & Related papers (2025-12-29T05:56:22Z)
Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels [50.43968216132018]
We present an end-to-end system that transforms any literary work into an immersive, multi-character conversational experience.<n>This system is designed to solve two fundamental challenges for LLM-driven characters.
arXiv Detail & Related papers (2025-12-08T11:57:46Z)
NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization [0.0]
We introduce NexusSum, a multi-agent LLM framework for narrative summarization.<n>A narrative-specific preprocessing method standardizes character dialogue and descriptive text into a unified format.<n>Our method establishes a new state-of-the-art in narrative summarization, achieving up to a 30.0% improvement in BERTScore (F1) across books, movies, and TV scripts.
arXiv Detail & Related papers (2025-05-30T13:26:23Z)
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives [82.19488717416351]
This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames.<n>StoryAnchors employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency.<n>It also integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics.
arXiv Detail & Related papers (2025-05-13T08:48:10Z)
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration [88.94832383850533]
We propose a multi-agent framework designed for Customized Storytelling Video Generation (CSVG) StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.
arXiv Detail & Related papers (2024-11-07T18:00:33Z)
Agents' Room: Narrative Generation through Multi-step Collaboration [54.98886593802834]
We propose a generation framework inspired by narrative theory that decomposes narrative writing into subtasks tackled by specialized agents.<n>We show that Agents' Room generates stories preferred by expert evaluators over those produced by baseline systems.
arXiv Detail & Related papers (2024-10-03T15:44:42Z)
HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing [45.95600225239927]
Large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. We present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting.
arXiv Detail & Related papers (2024-06-17T16:01:33Z)
From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent [11.553884271082127]
This paper introduces the StoryAgent framework to automate and refine digital storytelling. StoryAgent tackles key issues such as manual intervention, interactive scene orchestration, and narrative consistency. Results demonstrate the framework's capability to produce coherent digital stories without reference videos.
arXiv Detail & Related papers (2024-06-15T03:03:43Z)
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator [59.589919015669274]
This study focuses on zero-shot text-to-video generation considering the data- and cost-efficient. We propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence. We also propose a series of annotative modifications to adapting LDMs in the reverse process, including joint noise sampling, step-aware attention shift, and dual-path.
arXiv Detail & Related papers (2023-09-25T19:42:16Z)
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation [37.25815760042241]
This paper introduces a new framework, dubbed DirecT2V, to generate text-to-video (T2V) videos. We equip a diffusion model with a novel value mapping method and dual-softmax filtering, which do not require any additional training. The experimental results validate the effectiveness of our framework in producing visually coherent and storyful videos.
arXiv Detail & Related papers (2023-05-23T17:57:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.