Related papers: SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

URL: http://arxiv.org/abs/2407.12815v1
Date: Fri, 28 Jun 2024 22:19:01 GMT
Title: SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison
Authors: Anjali Rawal, Hui Wang, Youjia Zheng, Yu-Hsuan Lin, Shanu Sushmita,
Abstract summary: We compare the performance of machine learning algorithms on four datasets: (1) small (tweets from Election, FIFA, and Game of Thrones), (2) medium (Wikipedia introductions and PubMed abstracts), and (3) large (OpenAI web text dataset) Our results indicate that LLMs with very large parameters (such as the XL-1542 variant of GPT2 with 1542 million parameters) were harder to detect using traditional machine learning methods. We examine the characteristics of human and machine-generated texts across multiple dimensions, including linguistics, personality, sentiment, bias, and morality.
Score: 2.7147912878168303
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have gained significant attention due to their ability to mimic human language. Identifying texts generated by LLMs is crucial for understanding their capabilities and mitigating potential consequences. This paper analyzes datasets of varying text lengths: small, medium, and large. We compare the performance of machine learning algorithms on four datasets: (1) small (tweets from Election, FIFA, and Game of Thrones), (2) medium (Wikipedia introductions and PubMed abstracts), and (3) large (OpenAI web text dataset). Our results indicate that LLMs with very large parameters (such as the XL-1542 variant of GPT2 with 1542 million parameters) were harder (74%) to detect using traditional machine learning methods. However, detecting texts of varying lengths from LLMs with smaller parameters (762 million or less) can be done with high accuracy (96% and above). We examine the characteristics of human and machine-generated texts across multiple dimensions, including linguistics, personality, sentiment, bias, and morality. Our findings indicate that machine-generated texts generally have higher readability and closely mimic human moral judgments but differ in personality traits. SVM and Voting Classifier (VC) models consistently achieve high performance across most datasets, while Decision Tree (DT) models show the lowest performance. Model performance drops when dealing with rephrased texts, particularly shorter texts like tweets. This study underscores the challenges and importance of detecting LLM-generated texts and suggests directions for future research to improve detection methods and understand the nuanced capabilities of LLMs.

Related papers

Detecting LLM-Generated Text with Performance Guarantees [13.29284903739996]
Large language models (LLMs) such as GPT, Claude, Gemini, and Grok have been deeply integrated into our daily life.<n>They now support a wide range of tasks -- from dialogue and email drafting to assisting with teaching and coding.<n>Their ability to produce highly human-like text raises serious concerns, including the spread of fake news.
arXiv Detail & Related papers (2026-01-10T14:52:45Z)
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z)
mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection [0.0]
An automated detection is able to assist humans to indicate the machine-generated texts.<n>This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification.<n>It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025.
arXiv Detail & Related papers (2025-06-02T14:07:32Z)
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders [20.557610461777344]
We use Sparse Autoencoders (SAE) to extract features from Gemma-2-2b residual stream. We identify both interpretable and efficient features, analyzing their semantics and relevance. Our methods offer valuable insights into how texts from various models differ from human-written content.
arXiv Detail & Related papers (2025-03-05T15:33:52Z)
Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text [1.1137087573421256]
This study aims to support efforts to detect and identify textual content generated using Generative AI Large Language Models. We leverage several machine learning algorithms such as Random Forest (RF), and Recurrent Neural Networks (RNN) to understand the important features in attribution. Our method is divided into 1) binary classification to differentiate between human-written and AI-text, and 2) multi classification, to differentiate between human-written text and the text generated by the five different LLM tools.
arXiv Detail & Related papers (2025-01-06T18:46:53Z)
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models [0.0]
We identify significant differences between human-written texts and those generated by large language models (LLMs) Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs.
arXiv Detail & Related papers (2024-12-04T04:38:35Z)
GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck. Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts. Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z)
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple yet effective black-box zero-shot detection approach based on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts. Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods.
arXiv Detail & Related papers (2024-05-07T12:57:01Z)
Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT [9.177651206337005]
Large language models (LLMs) such as ChatGPT, Gemini, LlaMa, and Claude are trained on massive quantities of text parsed from the internet. We present a LLM fine-tuned on up to 40,000 data that can predict electromagnetic spectra over a range of frequencies given a text prompt.
arXiv Detail & Related papers (2024-04-23T19:05:42Z)
From Text to Source: Results in Detecting Large Language Model-Generated Content [17.306542392779445]
Large Language Models (LLMs) are celebrated for their ability to generate human-like text. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.
arXiv Detail & Related papers (2023-09-23T09:51:37Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations. We study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z)
The Imitation Game: Detecting Human and AI-Generated Texts in the Era of ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z)
LLMDet: A Third Party Large Language Models Generated Text Detection Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text. Existing detection tools can only differentiate between machine-generated and human-authored text. We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z)
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection [69.29017069438228]
Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries. This has also raised concerns about the potential misuse of such texts in journalism, education, and academia. In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse.
arXiv Detail & Related papers (2023-05-24T08:55:11Z)
MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection. We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z)
Smaller Language Models are Better Black-box Machine-Generated Text Detectors [56.36291277897995]
Small and partially-trained models are better universal text detectors. We find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance, the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
arXiv Detail & Related papers (2023-05-17T00:09:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.