Related papers: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

URL: http://arxiv.org/abs/2510.13939v3
Date: Sat, 01 Nov 2025 19:29:36 GMT
Title: Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers
Authors: Tuhin Chakrabarty, Jane C. Ginsburg, Paramveer Dhillon,
Abstract summary: It's unclear if frontier AI models can generate high quality literary text while emulating authors' styles.<n>We compare MFA-trained expert writers with three frontier AI models: ChatGPT, Claude & Gemini in writing up to 450 word excerpts emulating 50 award-winning authors' diverse styles.
Score: 8.031052360107092
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of copyrighted books for training AI models has led to numerous lawsuits from authors concerned about AI's ability to generate derivative content. Yet it's unclear if these models can generate high quality literary text while emulating authors' styles. To answer this we conducted a preregistered study comparing MFA-trained expert writers with three frontier AI models: ChatGPT, Claude & Gemini in writing up to 450 word excerpts emulating 50 award-winning authors' diverse styles. In blind pairwise evaluations by 159 representative expert & lay readers, AI-generated text from in-context prompting was strongly disfavored by experts for both stylistic fidelity (OR=0.16, p<10^-8) & writing quality (OR=0.13, p<10^-7) but showed mixed results with lay readers. However, fine-tuning ChatGPT on individual authors' complete works completely reversed these findings: experts now favored AI-generated text for stylistic fidelity (OR=8.16, p<10^-13) & writing quality (OR=1.87, p=0.010), with lay readers showing similar shifts. These effects generalize across authors & styles. The fine-tuned outputs were rarely flagged as AI-generated (3% rate v. 97% for in-context prompting) by best AI detectors. Mediation analysis shows this reversal occurs because fine-tuning eliminates detectable AI stylistic quirks (e.g., cliche density) that penalize in-context outputs. While we do not account for additional costs of human effort required to transform raw AI output into cohesive, publishable prose, the median fine-tuning & inference cost of $81 per author represents a dramatic 99.7% reduction compared to typical professional writer compensation. Author-specific fine-tuning thus enables non-verbatim AI writing that readers prefer to expert human writing, providing empirical evidence directly relevant to copyright's fourth fair-use factor, the "effect upon the potential market or value" of the source works.

Related papers

Can Good Writing Be Generative? Expert-Level AI Writing Emerges through Fine-Tuning on High-Quality Books [7.522123492613126]
Generative AI can emulate thousands of author styles in seconds with negligible marginal labor.<n>Experts preferred human writing in 82.7% of cases under the in-context prompting condition.<n>Lay judges, however, consistently preferred AI writing.
arXiv Detail & Related papers (2026-01-26T10:59:21Z)
Can professional translators identify machine-generated text? [0.0]
This study investigates whether professional translators can reliably identify short stories generated in Italian by artificial intelligence (AI) without prior specialized training.<n>Sixty-nine translators took part in an in-person experiment, where they assessed three anonymized short stories.<n>Low burstiness and narrative contradiction emerged as the most reliable indicators of synthetic authorship.
arXiv Detail & Related papers (2026-01-22T10:25:52Z)
Everyone prefers human writers, including AI [0.0]
We conducted experiments using Raymond Queneaus Exercises Style (1947) to measure attribution bias.<n>Humans showed +13.7 percentage point (pp) bias (Cohen's h = 0.28, 95% CI: 0.21-0.34), while AI models showed +34.3 percentage point bias (h = 0.70, 95% CI: 0.65-0.76), a 2.5-fold stronger effect.
arXiv Detail & Related papers (2025-10-09T21:33:30Z)
EditLens: Quantifying the Extent of AI Editing in Text [23.457378805409714]
We show that AI-edited text is distinguishable from human-written and AI-generated text.<n>We train a regression model that predicts the amount of AI editing present within a text.<n>Not only do we show that AI-edited text can be detected, but also that the degree of change made by AI to human writing can be detected.
arXiv Detail & Related papers (2025-10-03T16:27:48Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
Could AI Trace and Explain the Origins of AI-Generated Images and Text? [53.11173194293537]
AI-generated content is increasingly prevalent in the real world.<n> adversaries might exploit large multimodal models to create images that violate ethical or legal standards.<n>Paper reviewers may misuse large language models to generate reviews without genuine intellectual effort.
arXiv Detail & Related papers (2025-04-05T20:51:54Z)
AuthorMist: Evading AI Text Detectors with Reinforcement Learning [4.806579822134391]
AuthorMist is a novel reinforcement learning-based system to transform AI-generated text into human-like writing.<n>We show that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning.
arXiv Detail & Related papers (2025-03-10T12:41:05Z)
Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z)
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [58.419940585826744]
We introduce FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors.<n>We partitioned data into subgroups based on attributes (e.g., text length and writing style) and implemented FairOPT to learn decision thresholds for each group to reduce discrepancy.<n>Our framework paves the way for more robust classification in AI-generated content detection via post-processing.
arXiv Detail & Related papers (2025-02-06T21:58:48Z)
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text [37.36534911201806]
We hire annotators to read 300 non-fiction English articles and label them as either human-written or AI-generated.<n>Experiments show that annotators who frequently use LLMs for writing tasks excel at detecting AI-generated text.<n>We release our annotated dataset and code to spur future research into both human and automated detection of AI-generated text.
arXiv Detail & Related papers (2025-01-26T19:31:34Z)
"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models [97.22914355737676]
We examine whether and how writers want to preserve their authentic voice when co-writing with AI tools. Our findings illuminate conceptions of authenticity in human-AI co-creation. Readers' responses showed less concern about human-AI co-writing.
arXiv Detail & Related papers (2024-11-20T04:42:32Z)
Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.<n>We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z)
Understanding writing style in social media with a supervised contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation. We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts. Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.