Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries
Through Blinded Reviewers and Text Classification Algorithms
- URL: http://arxiv.org/abs/2303.17650v3
- Date: Mon, 28 Aug 2023 09:34:59 GMT
- Title: Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries
Through Blinded Reviewers and Text Classification Algorithms
- Authors: Mayank Soni and Vincent Wade
- Abstract summary: ChatGPT, developed by OpenAI, is a recent addition to the family of language models.
We evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers.
- Score: 0.8339831319589133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have gathered significant attention due to their
impressive performance on a variety of tasks. ChatGPT, developed by OpenAI, is
a recent addition to the family of language models and is being called a
disruptive technology by a few, owing to its human-like text-generation
capabilities. Although, many anecdotal examples across the internet have
evaluated ChatGPT's strength and weakness, only a few systematic research
studies exist. To contribute to the body of literature of systematic research
on ChatGPT, we evaluate the performance of ChatGPT on Abstractive Summarization
by the means of automated metrics and blinded human reviewers. We also build
automatic text classifiers to detect ChatGPT generated summaries. We found that
while text classification algorithms can distinguish between real and generated
summaries, humans are unable to distinguish between real summaries and those
produced by ChatGPT.
Related papers
- DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated
Text [1.9643748953805937]
generative language models can potentially deceive by generating artificial text that appears to be human-generated.
This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT.
arXiv Detail & Related papers (2023-09-14T13:05:20Z) - Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect
ChatGPT-Generated Text [48.36706154871577]
We introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts)
It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts.
We also propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text.
arXiv Detail & Related papers (2023-07-21T06:38:37Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - Comparative Analysis of CHATGPT and the evolution of language models [0.0]
This paper highlights the prevailing ideas in NLP, including machine translation, machine summarization, question-answering, and language generation.
A strategy for validating the arguments and results of ChatGPT is presented summarily as an example of safe, large-scale adoption of Large Language Models.
arXiv Detail & Related papers (2023-03-28T03:11:28Z) - Is ChatGPT a Good NLG Evaluator? A Preliminary Study [121.77986688862302]
We provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric.
Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments.
We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.
arXiv Detail & Related papers (2023-03-07T16:57:20Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Exploring the Limits of ChatGPT for Query or Aspect-based Text
Summarization [28.104696513516117]
Large language models (LLMs) like GPT3 and ChatGPT have recently created significant interest in using these models for text summarization tasks.
Recent studies citegoyal2022news, zhang2023benchmarking have shown that LLMs-generated news summaries are already on par with humans.
Our experiments reveal that ChatGPT's performance is comparable to traditional fine-tuning methods in terms of Rouge scores.
arXiv Detail & Related papers (2023-02-16T04:41:30Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.