Related papers: GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

URL: http://arxiv.org/abs/2305.07969v2
Date: Wed, 17 May 2023 18:21:03 GMT
Title: GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
Authors: Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj
Abstract summary: We present a novel approach for detecting ChatGPT-generated vs. human-written text using language models. Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics.
Score: 27.901155229342375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models. To this end, we first collected and released a pre-processed dataset named OpenGPTText, which consists of rephrased content generated using ChatGPT. We then designed, implemented, and trained two different models for text classification, using Robustly Optimized BERT Pretraining Approach (RoBERTa) and Text-to-Text Transfer Transformer (T5), respectively. Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics. Furthermore, we conducted an interpretability study to showcase our model's ability to extract and differentiate key features between human-written and ChatGPT-generated text. Our findings provide important insights into the effective use of language models to detect generated text.

Related papers

Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method [4.802604527842989]
We present GPT Reddit dataset (GRiD), a novel Generative Pretrained Transformer (GPT)-generated text detection dataset. The dataset consists of context-prompt pairs based on Reddit with human-generated and ChatGPT-generated responses. To showcase the dataset's utility, we benchmark several detection methods on it, demonstrating their efficacy in distinguishing between human and ChatGPT-generated responses.
arXiv Detail & Related papers (2024-03-12T05:15:21Z)
On the Generalization of Training-based ChatGPT Detection Methods [33.46128880100525]
ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. There is also an urgent need to detect the texts generated ChatGPT from human written.
arXiv Detail & Related papers (2023-10-02T16:13:08Z)
Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated Text [1.9643748953805937]
generative language models can potentially deceive by generating artificial text that appears to be human-generated. This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT.
arXiv Detail & Related papers (2023-09-14T13:05:20Z)
Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text [48.36706154871577]
We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts) It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
arXiv Detail & Related papers (2023-07-21T06:38:37Z)
ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer [8.64514166615844]
We conduct a systematic inspection of ChatGPT's performance in two controllable generation tasks. We evaluate the faithfulness of the generated text, and compare the model's performance with human-authored texts. We observe that ChatGPT sometimes incorporates factual errors or hallucinations when adapting the text to suit a specific style.
arXiv Detail & Related papers (2023-06-13T14:21:35Z)
ChatGraph: Interpretable Text Classification by Converting ChatGPT Knowledge to Graphs [54.48467003509595]
ChatGPT has shown superior performance in various natural language processing (NLP) tasks. We propose a novel framework that leverages the power of ChatGPT for specific tasks, such as text classification. Our method provides a more transparent decision-making process compared with previous text classification methods.
arXiv Detail & Related papers (2023-05-03T19:57:43Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability [57.71052396828714]
This paper presents the first comprehensive analysis of ChatGPT's Text-to- abilities. We conducted experiments on 12 benchmark datasets with different languages, settings, or scenarios. Although there is still a gap from the current state-of-the-art (SOTA) model performance, ChatGPT's performance is still impressive.
arXiv Detail & Related papers (2023-03-12T04:22:01Z)
AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT) AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z)
Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text. Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.