Related papers: Improving Quotation Attribution with Fictional Character Embeddings

Improving Quotation Attribution with Fictional Character Embeddings

URL: http://arxiv.org/abs/2406.11368v1
Date: Mon, 17 Jun 2024 09:46:35 GMT
Title: Improving Quotation Attribution with Fictional Character Embeddings
Authors: Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara,
Abstract summary: We propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global information of characters. We show that our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance.
Score: 11.259583037191772
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans naturally attribute utterances of direct speech to their speaker in literary works. When attributing quotes, we process contextual information but also access mental representations of characters that we build and revise throughout the narrative. Recent methods to automatically attribute such utterances have explored simulating human logic with deterministic rules or learning new implicit rules with neural networks when processing contextual information. However, these systems inherently lack \textit{character} representations, which often leads to errors on more challenging examples of attribution: anaphoric and implicit quotes. In this work, we propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global information of characters. To build these embeddings, we create DramaCV, a corpus of English drama plays from the 15th to 20th century focused on Character Verification (CV), a task similar to Authorship Verification (AV), that aims at analyzing fictional characters. We train a model similar to the recently proposed AV model, Universal Authorship Representation (UAR), on this dataset, showing that it outperforms concurrent methods of characters embeddings on the CV task and generalizes better to literary novels. Then, through an extensive evaluation on 22 novels, we show that combining BookNLP's contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance. Code and data will be made publicly available.

Related papers

What's in a prompt? Language models encode literary style in prompt embeddings [3.0583407443282367]
We show how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers.<n>We observe that short excerpts from different novels separate in the latent space independently from what next-token prediction they converge towards.<n>This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models.
arXiv Detail & Related papers (2025-05-19T15:56:13Z)
MoCha: Towards Movie-Grade Talking Character Synthesis [62.007000023747445]
We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text.<n>Unlike talking head, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region.<n>We propose MoCha, the first of its kind to generate talking characters.
arXiv Detail & Related papers (2025-03-30T04:22:09Z)
Show, Don't Tell: Uncovering Implicit Character Portrayal using LLMs [19.829683714192615]
We introduce LIIPA, a framework for prompting large language models to uncover implicit character portrayals. We find that LIIPA outperforms existing approaches, and is more robust to increasing character counts. Our work demonstrates the potential benefits of using LLMs to analyze complex characters.
arXiv Detail & Related papers (2024-12-05T19:46:53Z)
CHATTER: A Character Attribution Dataset for Narrative Understanding [31.540540919042154]
We validate a subset of Chatter, called ChatterEval, using human annotations to serve as an evaluation benchmark for the character attribution task in movie scripts. ChatterEval assesses narrative understanding and the long-context modeling capacity of language models.
arXiv Detail & Related papers (2024-11-07T22:37:30Z)
BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z)
Generating Visual Stories with Grounded and Coreferent Characters [63.07511918366848]
We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
arXiv Detail & Related papers (2024-09-20T14:56:33Z)
Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint. We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z)
CHIRON: Rich Character Representations in Long-Form Narratives [98.273323001781]
We propose CHIRON, a new character sheet' based representation that organizes and filters textual information about characters. We validate CHIRON via the downstream task of masked-character prediction, where our experiments show CHIRON is better and more flexible than comparable summary-based baselines. metrics derived from CHIRON can be used to automatically infer character-centricity in stories, and that these metrics align with human judgments.
arXiv Detail & Related papers (2024-06-14T17:23:57Z)
Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution [12.300285585201767]
We explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models. Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes.
arXiv Detail & Related papers (2024-01-30T12:49:40Z)
Improving Automatic Quotation Attribution in Literary Novels [21.164701493247794]
Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels. We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.
arXiv Detail & Related papers (2023-07-07T17:37:01Z)
PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
"Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding [31.803481510886378]
We present LiSCU -- a new dataset of literary pieces and their summaries paired with descriptions of characters that appear in them. We also introduce two new tasks on LiSCU: Character Identification and Character Description Generation. Our experiments with several pre-trained language models adapted for these tasks demonstrate that there is a need for better models of narrative comprehension.
arXiv Detail & Related papers (2021-09-12T06:12:55Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.