Distinguishing Fictional Voices: a Study of Authorship Verification
Models for Quotation Attribution
- URL: http://arxiv.org/abs/2401.16968v1
- Date: Tue, 30 Jan 2024 12:49:40 GMT
- Title: Distinguishing Fictional Voices: a Study of Authorship Verification
Models for Quotation Attribution
- Authors: Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara
- Abstract summary: We explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models.
Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes.
- Score: 12.300285585201767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent approaches to automatically detect the speaker of an utterance of
direct speech often disregard general information about characters in favor of
local information found in the context, such as surrounding mentions of
entities. In this work, we explore stylistic representations of characters
built by encoding their quotes with off-the-shelf pretrained Authorship
Verification models in a large corpus of English novels (the Project Dialogism
Novel Corpus). Results suggest that the combination of stylistic and topical
information captured in some of these models accurately distinguish characters
among each other, but does not necessarily improve over semantic-only models
when attributing quotes. However, these results vary across novels and more
investigation of stylometric models particularly tailored for literary texts
and the study of characters should be conducted.
Related papers
- Identifying Speakers and Addressees of Quotations in Novels with Prompt Learning [5.691280935924612]
We propose prompt learning-based methods for speaker and addressee identification based on fine-tuned pre-trained models.
Experiments on both Chinese and English datasets show the effectiveness of the proposed methods.
arXiv Detail & Related papers (2024-08-18T12:19:18Z) - Improving Quotation Attribution with Fictional Character Embeddings [11.259583037191772]
We propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global stylistic information of characters.
We show that combining BookNLP's contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes.
arXiv Detail & Related papers (2024-06-17T09:46:35Z) - Improving Automatic Quotation Attribution in Literary Novels [21.164701493247794]
Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data.
We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels.
We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.
arXiv Detail & Related papers (2023-07-07T17:37:01Z) - Wave to Syntax: Probing spoken language models for syntax [16.643072915927313]
We focus on the encoding of syntax in several self-supervised and visually grounded models of spoken language.
We show that syntax is captured most prominently in the middle layers of the networks, and more explicitly within models with more parameters.
arXiv Detail & Related papers (2023-05-30T11:43:18Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.