Related papers: Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

URL: http://arxiv.org/abs/2511.20459v1
Date: Tue, 25 Nov 2025 16:25:44 GMT
Title: Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts
Authors: Mosab Rezaei, Mina Rajaei Moghadam, Abdul Rahman Shaikh, Hamed Alhoori, Reva Freedman,
Abstract summary: We present a framework for both generating and evaluating sentences in the style of 19th-century novelists.<n>Large language models are fine-tuned with minimal, single-token prompts to produce text in the voices of authors such as Dickens, Austen, Twain, Alcott, and Melville.
Score: 3.7189423451031356
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models have created new opportunities for stylometry, the study of writing styles and authorship. Two challenges, however, remain central: training generative models when no paired data exist, and evaluating stylistic text without relying only on human judgment. In this work, we present a framework for both generating and evaluating sentences in the style of 19th-century novelists. Large language models are fine-tuned with minimal, single-token prompts to produce text in the voices of authors such as Dickens, Austen, Twain, Alcott, and Melville. To assess these generative models, we employ a transformer-based detector trained on authentic sentences, using it both as a classifier and as a tool for stylistic explanation. We complement this with syntactic comparisons and explainable AI methods, including attention-based and gradient-based analyses, to identify the linguistic cues that drive stylistic imitation. Our findings show that the generated text reflects the authors' distinctive patterns and that AI-based evaluation offers a reliable alternative to human assessment. All artifacts of this work are published online.

Related papers

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation [88.77441715819366]
Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content.<n>We propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity.
arXiv Detail & Related papers (2026-01-09T22:01:56Z)
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models [39.06617653124486]
We introduce a new evaluation framework called TypeScore to assess a model's ability to generate images with high-fidelity embedded text. Our proposed metric demonstrates greater resolution than CLIPScore to differentiate popular image generation models.
arXiv Detail & Related papers (2024-11-02T07:56:54Z)
Detecting Mode Collapse in Language Models via Narration [0.0]
We study 4,374 stories sampled from three OpenAI language models. We show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" Our method and results are significant for researchers seeking to employ language models in sociological simulations.
arXiv Detail & Related papers (2024-02-06T23:52:58Z)
Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution [12.300285585201767]
We explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models. Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes.
arXiv Detail & Related papers (2024-01-30T12:49:40Z)
Few-Shot Detection of Machine-Generated Text using Style Representations [4.326503887981912]
Language models that convincingly mimic human writing pose a significant risk of abuse. We propose to leverage representations of writing style estimated from human-authored text. We find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors.
arXiv Detail & Related papers (2024-01-12T17:26:51Z)
Learning to Generate Text in Arbitrary Writing Styles [6.7308816341849695]
It is desirable for language models to produce text in an author-specific style on the basis of a potentially small writing sample. We propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features.
arXiv Detail & Related papers (2023-12-28T18:58:52Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Prototype-to-Style: Dialogue Generation with Style-Aware Editing on Retrieval Memory [65.98002918470543]
We introduce a new prototype-to-style framework to tackle the challenge of stylistic dialogue generation. The framework uses an Information Retrieval (IR) system and extracts a response prototype from the retrieved response. A stylistic response generator then takes the prototype and the desired language style as model input to obtain a high-quality and stylistic response.
arXiv Detail & Related papers (2020-04-05T14:36:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.