Related papers: AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

URL: http://arxiv.org/abs/2410.04265v1
Date: Sat, 5 Oct 2024 18:55:01 GMT
Title: AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
Authors: Ximing Lu, Melanie Sclar, Skyler Hallinan, Niloofar Mireshghallah, Jiacheng Liu, Seungju Han, Allyson Ettinger, Liwei Jiang, Khyathi Chandu, Nouha Dziri, Yejin Choi,
Abstract summary: We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
Score: 53.15652021126663
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativity. We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text by reconstructing it from existing text snippets on the web. CREATIVITY INDEX is motivated by the hypothesis that the seemingly remarkable creativity of LLMs may be attributable in large part to the creativity of human-written texts on the web. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm that can search verbatim and near-verbatim matches of text snippets from a given document against the web. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs, and that alignment reduces the CREATIVITY INDEX of LLMs by an average of 30.1%. In addition, we find that distinguished authors like Hemingway exhibit measurably higher CREATIVITY INDEX compared to other human writers. Finally, we demonstrate that CREATIVITY INDEX can be used as a surprisingly effective criterion for zero-shot machine text detection, surpassing the strongest existing zero-shot system, DetectGPT, by a significant margin of 30.2%, and even outperforming the strongest supervised system, GhostBuster, in five out of six domains.

Related papers

AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models [0.2796197251957245]
Large Language Models (LLMs) produce text that is both grammatically correct and semantically meaningful.<n>LLMs have been misused to create highly realistic phishing emails, spread fake news, generate code to automate cyber crime, and write fraudulent scientific articles.<n>Various attempts have been made to distinguish machine-generated text from human-authored content using linguistic, statistical, machine learning, and ensemble-based approaches.
arXiv Detail & Related papers (2025-07-07T16:13:13Z)
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs [50.0874045899661]
We introduce CharacterBot, a model designed to replicate both the linguistic patterns and distinctive thought processes of a character. Using Lu Xun as a case study, we propose four training tasks derived from his 17 essay collections. These include a pre-training task focused on mastering external linguistic structures and knowledge, as well as three fine-tuning tasks. We evaluate CharacterBot on three tasks for linguistic accuracy and opinion comprehension, demonstrating that it significantly outperforms the baselines on our adapted metrics.
arXiv Detail & Related papers (2025-02-18T16:11:54Z)
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models [0.0]
We identify significant differences between human-written texts and those generated by large language models (LLMs) Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs.
arXiv Detail & Related papers (2024-12-04T04:38:35Z)
Evaluating Creative Short Story Generation in Humans and Large Language Models [0.7965327033045846]
Large language models (LLMs) have demonstrated the ability to generate high-quality stories, but their creative story-writing capabilities remain under-explored. We conduct a systematic analysis of creativity in short story generation across 60 LLMs and 60 people using a five-sentence creative story-writing task.
arXiv Detail & Related papers (2024-11-04T17:40:39Z)
Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z)
Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts [17.369951848952265]
We investigate the ability of LLMs to replicate human writing style in short, creative texts in the domain of Showerthoughts. We measure human preference on the texts across the specific dimensions that account for the quality of creative, witty texts. We conclude that human evaluators rate the generated texts slightly worse on average regarding their creative quality, but they are unable to reliably distinguish between human-written and AI-generated texts.
arXiv Detail & Related papers (2024-05-02T18:29:58Z)
Deep Learning Detection Method for Large Language Models-Generated Scientific Content [0.0]
Large Language Models generate scientific content that is indistinguishable from that written by humans. This research paper presents a novel ChatGPT-generated scientific text detection method, AI-Catcher. On average, AI-Catcher improved accuracy by 37.4%.
arXiv Detail & Related papers (2024-02-27T19:16:39Z)
Raidar: geneRative AI Detection viA Rewriting [42.477151044325595]
Large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
arXiv Detail & Related papers (2024-01-23T18:57:53Z)
The Imitation Game: Detecting Human and AI-Generated Texts in the Era of ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z)
Large Language Models are Diverse Role-Players for Summarization Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z)
Exploring AI-Generated Text in Student Writing: How Does AI Help? [0.0]
It remains unclear to what extent AI-generated text in these students' writing might lead to higher-quality writing. We explored 23 Hong Kong secondary school students' attempts to write stories comprising their own words and AI-generated text.
arXiv Detail & Related papers (2023-03-10T14:36:47Z)
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation [68.96699389728964]
We propose iNLG that uses machine-generated images to guide language models in open-ended text generation. Experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks.
arXiv Detail & Related papers (2022-10-07T18:01:09Z)
Enabling Language Models to Fill in the Blanks [81.59381915581892]
We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document. We train (or fine-tune) off-the-shelf language models on sequences containing the concatenation of artificially-masked text and the text which was masked. We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.
arXiv Detail & Related papers (2020-05-11T18:00:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.