Related papers: Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors

Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors

URL: http://arxiv.org/abs/2509.14543v1
Date: Thu, 18 Sep 2025 02:18:49 GMT
Title: Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors
Authors: Zhengxiang Wang, Nafis Irtiza Tripto, Solha Park, Zhenzhen Li, Jiawei Zhou,
Abstract summary: This work presents a comprehensive evaluation of large language models' ability to mimic personal writing styles.<n>We introduce an ensemble of complementary metrics-including authorship attribution, authorship verification, style matching, and AI detection-to robustly assess style imitation.<n>Results show that while LLMs can approximate user styles in structured formats like news and email, they struggle with nuanced, informal writing in blogs and forums.
Score: 9.921537507947473
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) become increasingly integrated into personal writing tools, a critical question arises: can LLMs faithfully imitate an individual's writing style from just a few examples? Personal style is often subtle and implicit, making it difficult to specify through prompts yet essential for user-aligned generation. This work presents a comprehensive evaluation of state-of-the-art LLMs' ability to mimic personal writing styles via in-context learning from a small number of user-authored samples. We introduce an ensemble of complementary metrics-including authorship attribution, authorship verification, style matching, and AI detection-to robustly assess style imitation. Our evaluation spans over 40000 generations per model across domains such as news, email, forums, and blogs, covering writing samples from more than 400 real-world authors. Results show that while LLMs can approximate user styles in structured formats like news and email, they struggle with nuanced, informal writing in blogs and forums. Further analysis on various prompting strategies such as number of demonstrations reveal key limitations in effective personalization. Our findings highlight a fundamental gap in personalized LLM adaptation and the need for improved techniques to support implicit, style-consistent generation. To aid future research and for reproducibility, we open-source our data and code.

Related papers

Evaluating Style-Personalized Text Generation: Challenges and Directions [13.84471733325089]
Style personalization is highly specific, relative to every user, and depends strongly on the pragmatic context.<n>We examine the effectiveness of the most common metrics used in the field, such as BLEU, embeddings, and LLMs-as-judges.<n>We find strong evidence that employing ensembles of diverse evaluation metrics consistently outperforms single-evaluator methods.
arXiv Detail & Related papers (2025-08-08T15:07:31Z)
Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback [57.200668979963694]
We present a novel test set of 1,300 stories that we corrupted to intentionally introduce writing issues.<n>We study the performance of commonly used LLMs in this task with both automatic and human evaluation metrics.
arXiv Detail & Related papers (2025-07-21T18:56:50Z)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations [112.81207927088117]
PersonaConvBench is a benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs)<n>We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements.
arXiv Detail & Related papers (2025-05-20T09:13:22Z)
Looking for the Inner Music: Probing LLMs' Understanding of Literary Style [3.5757761767474876]
Authorial style is easier to define than genre-level style.<n> pronoun usage and word order prove significant for defining both kinds of literary style.
arXiv Detail & Related papers (2025-02-05T22:20:17Z)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
Customizing Large Language Model Generation Style using Parameter-Efficient Finetuning [24.263699489328427]
One-size-fits-all large language models (LLMs) are increasingly being used to help people with their writing. This paper explores whether parameter-efficient finetuning (PEFT) with Low-Rank Adaptation can effectively guide the style of LLM generations.
arXiv Detail & Related papers (2024-09-06T19:25:18Z)
Capturing Style in Author and Document Representation [4.323709559692927]
We propose a new architecture that learns embeddings for both authors and documents with a stylistic constraint.<n>We evaluate our method on three datasets: a literary corpus extracted from the Gutenberg Project, the Blog Authorship and IMDb62.
arXiv Detail & Related papers (2024-07-18T10:01:09Z)
Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant [28.752596543740225]
We present a new design and evaluation for such an automated assistant, which we call Panza.<n>Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation.<n>We demonstrate that this combination allows us to fine-tune an LLM to reflect a user's writing style using limited data, while executing on extremely limited resources.
arXiv Detail & Related papers (2024-06-24T12:09:34Z)
Step-Back Profiling: Distilling User History for Personalized Scientific Writing [50.481041470669766]
Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals. We introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles. Our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark.
arXiv Detail & Related papers (2024-06-20T12:58:26Z)
Learning Interpretable Style Embeddings via Prompting LLMs [46.74488355350601]
Style representation learning builds content-independent representations of author style in text. Current style representation learning uses neural methods to disentangle style from content to create style vectors. We use prompting to perform stylometry on a large number of texts to create a synthetic dataset and train human-interpretable style representations.
arXiv Detail & Related papers (2023-05-22T04:07:54Z)
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation. We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.