Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text
- URL: http://arxiv.org/abs/2307.11380v2
- Date: Sat, 30 Dec 2023 13:17:52 GMT
- Title: Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text
- Authors: Lingyi Yang, Feng Jiang, Haizhou Li
- Abstract summary: We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts)
It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts.
We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
- Score: 48.36706154871577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The remarkable capabilities of large-scale language models, such as ChatGPT,
in text generation have impressed readers and spurred researchers to devise
detectors to mitigate potential risks, including misinformation, phishing, and
academic dishonesty. Despite this, most previous studies have been
predominantly geared towards creating detectors that differentiate between
purely ChatGPT-generated texts and human-authored texts. This approach,
however, fails to work on discerning texts generated through human-machine
collaboration, such as ChatGPT-polished texts. Addressing this gap, we
introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts),
facilitating the construction of more robust detectors. It diverges from extant
corpora by comprising pairs of human-written and ChatGPT-polished abstracts
instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish
Ratio" method, an innovative measure of the degree of modification made by
ChatGPT compared to the original human-written text. It provides a mechanism to
measure the degree of ChatGPT influence in the resulting text. Our experimental
results show our proposed model has better robustness on the HPPT dataset and
two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we
proposed offers a more comprehensive explanation by quantifying the degree of
ChatGPT involvement.
Related papers
- GPT-generated Text Detection: Benchmark Dataset and Tensor-based
Detection Method [4.802604527842989]
We present GPT Reddit dataset (GRiD), a novel Generative Pretrained Transformer (GPT)-generated text detection dataset.
The dataset consists of context-prompt pairs based on Reddit with human-generated and ChatGPT-generated responses.
To showcase the dataset's utility, we benchmark several detection methods on it, demonstrating their efficacy in distinguishing between human and ChatGPT-generated responses.
arXiv Detail & Related papers (2024-03-12T05:15:21Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated
Text [1.9643748953805937]
generative language models can potentially deceive by generating artificial text that appears to be human-generated.
This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT.
arXiv Detail & Related papers (2023-09-14T13:05:20Z) - ChatGPT vs Human-authored Text: Insights into Controllable Text
Summarization and Sentence Style Transfer [8.64514166615844]
We conduct a systematic inspection of ChatGPT's performance in two controllable generation tasks.
We evaluate the faithfulness of the generated text, and compare the model's performance with human-authored texts.
We observe that ChatGPT sometimes incorporates factual errors or hallucinations when adapting the text to suit a specific style.
arXiv Detail & Related papers (2023-06-13T14:21:35Z) - GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content [27.901155229342375]
We present a novel approach for detecting ChatGPT-generated vs. human-written text using language models.
Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics.
arXiv Detail & Related papers (2023-05-13T17:12:11Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries
Through Blinded Reviewers and Text Classification Algorithms [0.8339831319589133]
ChatGPT, developed by OpenAI, is a recent addition to the family of language models.
We evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers.
arXiv Detail & Related papers (2023-03-30T18:28:33Z) - Is ChatGPT A Good Keyphrase Generator? A Preliminary Study [51.863368917344864]
ChatGPT has recently garnered significant attention from the computational linguistics community.
We evaluate its performance in various aspects, including keyphrase generation prompts, keyphrase generation diversity, and long document understanding.
We find that ChatGPT performs exceptionally well on all six candidate prompts, with minor performance differences observed across the datasets.
arXiv Detail & Related papers (2023-03-23T02:50:38Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.