DEMASQ: Unmasking the ChatGPT Wordsmith
- URL: http://arxiv.org/abs/2311.05019v1
- Date: Wed, 8 Nov 2023 21:13:05 GMT
- Title: DEMASQ: Unmasking the ChatGPT Wordsmith
- Authors: Kavita Kumari and Alessandro Pegoraro and Hossein Fereidooni and
Ahmad-Reza Sadeghi
- Abstract summary: We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
- Score: 63.8746084667206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The potential misuse of ChatGPT and other Large Language Models (LLMs) has
raised concerns regarding the dissemination of false information, plagiarism,
academic dishonesty, and fraudulent activities. Consequently, distinguishing
between AI-generated and human-generated content has emerged as an intriguing
research topic. However, current text detection methods lack precision and are
often restricted to specific tasks or domains, making them inadequate for
identifying content generated by ChatGPT. In this paper, we propose an
effective ChatGPT detector named DEMASQ, which accurately identifies
ChatGPT-generated content. Our method addresses two critical factors: (i) the
distinct biases in text composition observed in human- and machine-generated
content and (ii) the alterations made by humans to evade previous detection
methods. DEMASQ is an energy-based detection model that incorporates novel
aspects, such as (i) optimization inspired by the Doppler effect to capture the
interdependence between input text embeddings and output labels, and (ii) the
use of explainable AI techniques to generate diverse perturbations. To evaluate
our detector, we create a benchmark dataset comprising a mixture of prompts
from both ChatGPT and humans, encompassing domains such as medical, open Q&A,
finance, wiki, and Reddit. Our evaluation demonstrates that DEMASQ achieves
high accuracy in identifying content generated by ChatGPT.
Related papers
- Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated
Text [1.9643748953805937]
generative language models can potentially deceive by generating artificial text that appears to be human-generated.
This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT.
arXiv Detail & Related papers (2023-09-14T13:05:20Z) - HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus [22.302137281411646]
ChatGPT has garnered significant interest due to its impressive performance.
There is growing concern about its potential risks.
Current datasets used for detecting ChatGPT-generated text primarily focus on question-answering tasks.
arXiv Detail & Related papers (2023-09-06T05:33:57Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text [48.36706154871577]
We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts)
It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts.
We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
arXiv Detail & Related papers (2023-07-21T06:38:37Z) - Differentiate ChatGPT-generated and Human-written Medical Texts [8.53416950968806]
This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine.
We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT.
In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc.
arXiv Detail & Related papers (2023-04-23T07:38:07Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries
Through Blinded Reviewers and Text Classification Algorithms [0.8339831319589133]
ChatGPT, developed by OpenAI, is a recent addition to the family of language models.
We evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers.
arXiv Detail & Related papers (2023-03-30T18:28:33Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.