Related papers: The Reader is the Metric: How Textual Features and Reader Profiles Explain Conflicting Evaluations of AI Creative Writing

The Reader is the Metric: How Textual Features and Reader Profiles Explain Conflicting Evaluations of AI Creative Writing

URL: http://arxiv.org/abs/2506.03310v1
Date: Tue, 03 Jun 2025 18:50:22 GMT
Title: The Reader is the Metric: How Textual Features and Reader Profiles Explain Conflicting Evaluations of AI Creative Writing
Authors: Guillermo Marco, Julio Gonzalo, Víctor Fresno,
Abstract summary: We use five public datasets (1,471 stories, 101 annotators including critics, students, and lay readers) to extract 17 reference-less textual features.<n>We model individual reader preferences, deriving feature importance vectors that reflect their textual priorities.<n>Our results quantitatively explain how measurements of literary quality are a function of how text features align with each reader's preferences.
Score: 1.3654846342364306
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies comparing AI-generated and human-authored literary texts have produced conflicting results: some suggest AI already surpasses human quality, while others argue it still falls short. We start from the hypothesis that such divergences can be largely explained by genuine differences in how readers interpret and value literature, rather than by an intrinsic quality of the texts evaluated. Using five public datasets (1,471 stories, 101 annotators including critics, students, and lay readers), we (i) extract 17 reference-less textual features (e.g., coherence, emotional variance, average sentence length...); (ii) model individual reader preferences, deriving feature importance vectors that reflect their textual priorities; and (iii) analyze these vectors in a shared "preference space". Reader vectors cluster into two profiles: 'surface-focused readers' (mainly non-experts), who prioritize readability and textual richness; and 'holistic readers' (mainly experts), who value thematic development, rhetorical variety, and sentiment dynamics. Our results quantitatively explain how measurements of literary quality are a function of how text features align with each reader's preferences. These findings advocate for reader-sensitive evaluation frameworks in the field of creative text generation.

Related papers

ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review [48.60540055009675]
ScholarPeer is a search-enabled multi-agent framework designed to emulate the cognitive processes of a senior researcher.<n>We evaluate ScholarPeer on DeepReview-13K and the results demonstrate that ScholarPeer achieves significant win-rates against state-of-the-art approaches in side-by-side evaluations.
arXiv Detail & Related papers (2026-01-30T06:54:55Z)
Can professional translators identify machine-generated text? [0.0]
This study investigates whether professional translators can reliably identify short stories generated in Italian by artificial intelligence (AI) without prior specialized training.<n>Sixty-nine translators took part in an in-person experiment, where they assessed three anonymized short stories.<n>Low burstiness and narrative contradiction emerged as the most reliable indicators of synthetic authorship.
arXiv Detail & Related papers (2026-01-22T10:25:52Z)
BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [55.33653554387953]
Pattern Analysis and Machine Intelligence (PAMI) has led to numerous literature reviews aimed at collecting and fragmented information.<n>This paper presents a thorough analysis of these literature reviews within the PAMI field.<n>We try to address three core research questions: (1) What are the prevalent structural and statistical characteristics of PAMI literature reviews; (2) What strategies can researchers employ to efficiently navigate the growing corpus of reviews; and (3) What are the advantages and limitations of AI-generated reviews compared to human-authored ones.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models [92.60282074937305]
We introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images. We conduct experiments to assess the performance of 14 foundation models and establish a human performance baseline. We observe a significant performance gap of 30.8% between GPT-4V and human performance.
arXiv Detail & Related papers (2024-01-24T09:07:11Z)
Large Language Models are Diverse Role-Players for Summarization Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Textual Stylistic Variation: Choices, Genres and Individuals [0.8057441774248633]
This chapter argues for more informed target metrics for the statistical processing of stylistic variation in text collections. This chapter discusses variation given by genre, and contrasts it to variation occasioned by individual choice.
arXiv Detail & Related papers (2022-05-01T16:39:49Z)
Computational analyses of the topics, sentiments, literariness, creativity and beauty of texts in a large Corpus of English Literature [0.0]
The Gutenberg Literary English Corpus (GLEC) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. We report the results of three studies providing i) topic and sentiment analyses for six text categories of GLEC and its >100 authors, ii) novel measures of semantic complexity as indices of the literariness, creativity and book beauty of the works in GLEC.
arXiv Detail & Related papers (2022-01-12T08:16:52Z)
Modeling Social Readers: Novel Tools for Addressing Reception from Online Book Reviews [0.0]
We study the readers' distillation of the main storylines in a novel using a corpus of reviews of five popular novels. We make three important contributions to the study of infinite vocabulary networks. We present a new sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from the reviews.
arXiv Detail & Related papers (2021-05-03T20:10:14Z)
Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features. Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
A Comparative Study of Feature Types for Age-Based Text Classification [3.867363075280544]
We compare the effectiveness of various types of linguistic features for the task of age-based classification of fiction texts. The results obtained show that the features describing the text at the document level can significantly increase the quality of machine learning models.
arXiv Detail & Related papers (2020-09-24T18:41:10Z)
Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension. We propose to represent relations implicitly by situating structured knowledge in a context. We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.