Detecting and Grounding Important Characters in Visual Stories
- URL: http://arxiv.org/abs/2303.17647v1
- Date: Thu, 30 Mar 2023 18:24:06 GMT
- Title: Detecting and Grounding Important Characters in Visual Stories
- Authors: Danyang Liu, Frank Keller
- Abstract summary: We introduce the VIST-Character dataset, which provides rich character-centric annotations.
Based on this dataset, we propose two new tasks: important character detection and character grounding in visual stories.
We develop simple, unsupervised models based on distributional similarity and pre-trained vision-and-language models.
- Score: 18.870236356616907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Characters are essential to the plot of any story. Establishing the
characters before writing a story can improve the clarity of the plot and the
overall flow of the narrative. However, previous work on visual storytelling
tends to focus on detecting objects in images and discovering relationships
between them. In this approach, characters are not distinguished from other
objects when they are fed into the generation pipeline. The result is a
coherent sequence of events rather than a character-centric story. In order to
address this limitation, we introduce the VIST-Character dataset, which
provides rich character-centric annotations, including visual and textual
co-reference chains and importance ratings for characters. Based on this
dataset, we propose two new tasks: important character detection and character
grounding in visual stories. For both tasks, we develop simple, unsupervised
models based on distributional similarity and pre-trained vision-and-language
models. Our new dataset, together with these models, can serve as the
foundation for subsequent work on analysing and generating stories from a
character-centric perspective.
Related papers
- BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation.
We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses.
Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z) - Generating Visual Stories with Grounded and Coreferent Characters [63.07511918366848]
We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions.
Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark.
We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
arXiv Detail & Related papers (2024-09-20T14:56:33Z) - CHIRON: Rich Character Representations in Long-Form Narratives [98.273323001781]
We propose CHIRON, a new character sheet' based representation that organizes and filters textual information about characters.
We validate CHIRON via the downstream task of masked-character prediction, where our experiments show CHIRON is better and more flexible than comparable summary-based baselines.
metrics derived from CHIRON can be used to automatically infer character-centricity in stories, and that these metrics align with human judgments.
arXiv Detail & Related papers (2024-06-14T17:23:57Z) - Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models [79.21968152209193]
We introduce the NewEpisode benchmark to evaluate generative models' adaptability in generating new stories with fresh characters.
We propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics.
arXiv Detail & Related papers (2024-05-20T07:54:03Z) - TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling [14.15543866199545]
As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically.
We propose a novel method, Topic Aware Reinforcement Network for VIsual StoryTelling (TARN-VIST)
In particular, we pre-extracted the topic information of stories from both visual and linguistic perspectives.
arXiv Detail & Related papers (2024-03-18T08:01:23Z) - Personality Understanding of Fictional Characters during Book Reading [81.68515671674301]
We present the first labeled dataset PersoNet for this problem.
Our novel annotation strategy involves annotating user notes from online reading apps as a proxy for the original books.
Experiments and human studies indicate that our dataset construction is both efficient and accurate.
arXiv Detail & Related papers (2023-05-17T12:19:11Z) - "Let Your Characters Tell Their Story": A Dataset for Character-Centric
Narrative Understanding [31.803481510886378]
We present LiSCU -- a new dataset of literary pieces and their summaries paired with descriptions of characters that appear in them.
We also introduce two new tasks on LiSCU: Character Identification and Character Description Generation.
Our experiments with several pre-trained language models adapted for these tasks demonstrate that there is a need for better models of narrative comprehension.
arXiv Detail & Related papers (2021-09-12T06:12:55Z) - PlotMachines: Outline-Conditioned Generation with Dynamic Plot State
Tracking [128.76063992147016]
We present PlotMachines, a neural narrative model that learns to transform an outline into a coherent story by tracking the dynamic plot states.
In addition, we enrich PlotMachines with high-level discourse structure so that the model can learn different writing styles corresponding to different parts of the narrative.
arXiv Detail & Related papers (2020-04-30T17:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.