Related papers: Generating Visual Stories with Grounded and Coreferent Characters

Generating Visual Stories with Grounded and Coreferent Characters

URL: http://arxiv.org/abs/2409.13555v1
Date: Fri, 20 Sep 2024 14:56:33 GMT
Title: Generating Visual Stories with Grounded and Coreferent Characters
Authors: Danyang Liu, Mirella Lapata, Frank Keller,
Abstract summary: We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
Score: 63.07511918366848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Characters are important in narratives. They move the plot forward, create emotional connections, and embody the story's themes. Visual storytelling methods focus more on the plot and events relating to it, without building the narrative around specific characters. As a result, the generated stories feel generic, with character mentions being absent, vague, or incorrect. To mitigate these issues, we introduce the new task of character-centric story generation and present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. Specifically, we develop an automated pipeline to enrich VIST with visual and textual character coreference chains. We also propose new evaluation metrics to measure the richness of characters and coreference in stories. Experimental results show that our model generates stories with recurring characters which are consistent and coreferent to larger extent compared to baselines and state-of-the-art systems.

Related papers

Learning to Reason for Long-Form Story Generation [98.273323001781]
We propose a general story-generation task (Next-Chapter Prediction) and a reward formulation (Verified Rewards via Completion Likelihood Improvement) We learn to reason over a story's condensed information and generate a detailed plan for the next chapter. Our reasoning is evaluated via the chapters it helps a story-generator create, and compared against non-trained and supervised finetuning (SFT) baselines.
arXiv Detail & Related papers (2025-03-28T18:48:26Z)
StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization [36.14275850149665]
We propose a novel knowledge graph, namely Character Graph (textbfCG), which comprehensively represents various story-related knowledge. We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph (textbfC-CG), capable of consistent story visualization with rich text semantics.
arXiv Detail & Related papers (2024-12-10T10:16:50Z)
ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
Existing autoregressive methods struggle with high memory usage, slow generation speeds, and limited context integration. We propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation. In experiments on PororoSV and FlintstonesSV benchmarks, ContextualStory significantly outperforms existing methods in both story visualization and story continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z)
The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories [17.184517720465404]
We quantify and compare the emotional and descriptive features of storytelling from both generative processes, human and machine, along a set of six dimensions. We find that generated stories differ significantly from human stories along all six dimensions, and that human and machine generations display similar biases when grouped according to the narrative point-of-view and gender of the main protagonist.
arXiv Detail & Related papers (2024-06-24T16:24:18Z)
CHIRON: Rich Character Representations in Long-Form Narratives [98.273323001781]
We propose CHIRON, a new character sheet' based representation that organizes and filters textual information about characters. We validate CHIRON via the downstream task of masked-character prediction, where our experiments show CHIRON is better and more flexible than comparable summary-based baselines. metrics derived from CHIRON can be used to automatically infer character-centricity in stories, and that these metrics align with human judgments.
arXiv Detail & Related papers (2024-06-14T17:23:57Z)
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models [79.21968152209193]
We introduce the NewEpisode benchmark to evaluate generative models' adaptability in generating new stories with fresh characters. We propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics.
arXiv Detail & Related papers (2024-05-20T07:54:03Z)
TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling [14.15543866199545]
As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically. We propose a novel method, Topic Aware Reinforcement Network for VIsual StoryTelling (TARN-VIST) In particular, we pre-extracted the topic information of stories from both visual and linguistic perspectives.
arXiv Detail & Related papers (2024-03-18T08:01:23Z)
Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning. Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret. It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z)
Detecting and Grounding Important Characters in Visual Stories [18.870236356616907]
We introduce the VIST-Character dataset, which provides rich character-centric annotations. Based on this dataset, we propose two new tasks: important character detection and character grounding in visual stories. We develop simple, unsupervised models based on distributional similarity and pre-trained vision-and-language models.
arXiv Detail & Related papers (2023-03-30T18:24:06Z)
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning [12.264880519328353]
We introduce Commonsense-inference Augmented neural StoryTelling (CAST), a framework for introducing commonsense reasoning into the generation process. We find that our CAST method produces significantly more coherent, on-topic, enjoyable and fluent stories than existing models in both the single-character and two-character settings.
arXiv Detail & Related papers (2021-05-04T06:40:33Z)
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking [128.76063992147016]
We present PlotMachines, a neural narrative model that learns to transform an outline into a coherent story by tracking the dynamic plot states. In addition, we enrich PlotMachines with high-level discourse structure so that the model can learn different writing styles corresponding to different parts of the narrative.
arXiv Detail & Related papers (2020-04-30T17:16:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.