Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection
- URL: http://arxiv.org/abs/2406.06558v1
- Date: Sat, 1 Jun 2024 10:21:54 GMT
- Title: Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection
- Authors: Ye Zhang, Qian Leng, Mengran Zhu, Rui Ding, Yue Wu, Jintong Song, Yulu Gong,
- Abstract summary: We propose a novel hybrid approach that combines TF-IDF techniques with advanced machine learning models.
Our approach achieves superior performance compared to existing methods.
- Score: 8.149808049643344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF techniques with advanced machine learning models, including Bayesian classifiers, Stochastic Gradient Descent (SGD), Categorical Gradient Boosting (CatBoost), and 12 instances of Deberta-v3-large models. Our approach aims to address the challenges associated with detecting AI-generated text by leveraging the strengths of both traditional feature extraction methods and state-of-the-art deep learning models. Through extensive experiments on a comprehensive dataset, we demonstrate the effectiveness of our proposed method in accurately distinguishing between human and AI-generated text. Our approach achieves superior performance compared to existing methods. This research contributes to the advancement of AI-generated text detection techniques and lays the foundation for developing robust solutions to mitigate the challenges posed by AI-generated content.
Related papers
- DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text [4.902089836908786]
WhosAI is a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI.
We show that our proposed framework achieves outstanding results in both the Turing Test and Authorship tasks.
arXiv Detail & Related papers (2024-07-12T15:44:56Z) - Who Writes the Review, Human or AI? [0.36498648388765503]
This study proposes a methodology to accurately distinguish AI-generated and human-written book reviews.
Our approach utilizes transfer learning, enabling the model to identify generated text across different topics.
The experimental results demonstrate that it is feasible to detect the original source of text, achieving an accuracy rate of 96.86%.
arXiv Detail & Related papers (2024-05-30T17:38:44Z) - ToBlend: Token-Level Blending With an Ensemble of LLMs to Attack AI-Generated Text Detection [6.27025292177391]
ToBlend is a novel token-level ensemble text generation method to challenge the robustness of current AI-content detection approaches.
We find ToBlend significantly drops the performance of most mainstream AI-content detection methods.
arXiv Detail & Related papers (2024-02-17T02:25:57Z) - Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing
AI-Generated Text [0.0]
My research investigates the use of cutting-edge hybrid deep learning models to accurately differentiate between AI-generated text and human writing.
I applied a robust methodology, utilising a carefully selected dataset comprising AI and human texts from various sources, each tagged with instructions.
arXiv Detail & Related papers (2023-11-27T06:26:53Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.