Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Generation
- URL: http://arxiv.org/abs/2601.17226v1
- Date: Fri, 23 Jan 2026 23:23:42 GMT
- Title: Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Generation
- Authors: David Y. Liu, Xanthe Muston, Aditya Joshi, Sebastian Sequoiah-Grayson,
- Abstract summary: We use Todorov's Theory of Narrative Equilibrium to establish principles that define desirable ASG qualities.<n>We prompt 7B and 14B LLM-as-judge models with our principles to test alignment with human annotators.<n>We show that d-RLAIF offers a viable alternative to supervised fine-tuning (SFT)
- Score: 5.151910664667141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the subjective nature of storytelling, past works on automatic story generation (ASG) have relied on limited ground truths for training and evaluation. In this work, we explore reinforcement learning (d-RLAIF) as a post-training alternative to supervised fine-tuning (SFT). We first apply Todorov's Theory of Narrative Equilibrium to establish principles that define desirable ASG qualities. We prompt 7B and 14B LLM-as-judge models with our principles to test alignment with human annotators and provide reward signals during d-RLAIF. We use Gemini-3-Flash to evaluate the output of our post-trained models and compare them to human-written stories from the TimeTravel dataset. We show that d-RLAIF offers a viable alternative to supervised fine-tuning (SFT)--producing stories that are more diverse and aligned with human narrative conventions. Our paper demonstrates the promise of reinforcement learning for linguistically grounded post-training for subjective tasks such as ASG.
Related papers
- Expanding the Capabilities of Reinforcement Learning via Text Feedback [49.561885700139676]
We formalize a multi-turn RL setup, RL from Text Feedback (RLTF), where text feedback is available during training but not at inference.<n>To do this, we propose two methods: Self Distillation (RLTF-SD), which trains the single-turn policy to match its own feedback-conditioned second-turn generations; and Feedback Modeling (RLTF-FM), which predicts the feedback as an auxiliary objective.<n>Our results show that both methods consistently outperform strong baselines across benchmarks.
arXiv Detail & Related papers (2026-02-02T18:56:56Z) - Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels [50.43968216132018]
We present an end-to-end system that transforms any literary work into an immersive, multi-character conversational experience.<n>This system is designed to solve two fundamental challenges for LLM-driven characters.
arXiv Detail & Related papers (2025-12-08T11:57:46Z) - NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models [8.6767620170781]
Video large language models (Video LLMs) have recently achieved strong performance on tasks such as captioning, summarization, and question answering.<n>Many models and training methods explicitly encourage continuity across events to enhance narrative coherence.<n>We identify this bias, which we call narrative prior, as a key driver of two errors: hallucinations, where non-existent events are introduced or existing ones are misinterpreted, and omissions, where factual events are suppressed because they are misaligned with surrounding context.
arXiv Detail & Related papers (2025-11-09T17:41:11Z) - Playpen: An Environment for Exploring Learning Through Conversational Interaction [84.0413820245725]
We investigate whether Dialogue Games can also serve as a source of feedback signals for learning.<n>We introduce Playpen, an environment for off- and online learning through Dialogue Game self-play.<n>We find that imitation learning through SFT improves performance on unseen instances, but negatively impacts other skills.
arXiv Detail & Related papers (2025-04-11T14:49:33Z) - Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation [63.54377402784965]
We propose a Rewriting-driven AugMentation (RAM) paradigm for Vision-Language Navigation (VLN)<n>Benefiting from our rewriting mechanism, new observation-instruction pairs can be obtained in both simulator-free and labor-saving manners.<n> Experiments on both the discrete environments (R2R, REVERIE, and R4R dataset) and continuous environments (R2R-CE dataset) show the superior performance and impressive generalization ability of our method.
arXiv Detail & Related papers (2025-03-23T13:18:17Z) - Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition [8.058451580903123]
We introduce a novel method that measures story quality in terms of human likeness.
We then use this method to evaluate the stories generated by several models.
Upgrading the visual and language components of TAPM results in a model that yields competitive performance.
arXiv Detail & Related papers (2024-07-05T14:48:15Z) - Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal
Storyteller [21.953766228135827]
We propose a new pipeline, termed LLaMS, to generate multimodal human-level stories.
We first employ a sequence data auto-enhancement strategy to enhance factual content expression.
Secondly, we propose SQ-Adatpter module for story illustration generation which can maintain sequence consistency.
arXiv Detail & Related papers (2024-03-12T04:07:00Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story
Continuation [76.44802273236081]
We develop a model StoryDALL-E for story continuation, where the generated visual story is conditioned on a source image.
We show that our retro-fitting approach outperforms GAN-based models for story continuation and facilitates copying of visual elements from the source image.
Overall, our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.
arXiv Detail & Related papers (2022-09-13T17:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.