AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays
- URL: http://arxiv.org/abs/2304.14276v1
- Date: Mon, 24 Apr 2023 12:58:28 GMT
- Title: AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays
- Authors: Steffen Herbold, Annette Hautli-Janisz, Ute Heuer, Zlata Kikteva,
Alexander Trautsch
- Abstract summary: ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
- Score: 66.36541161082856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Recently, ChatGPT and similar generative AI models have attracted
hundreds of millions of users and become part of the public discourse. Many
believe that such models will disrupt society and will result in a significant
change in the education system and information generation in the future. So
far, this belief is based on either colloquial evidence or benchmarks from the
owners of the models -- both lack scientific rigour.
Objective: Through a large-scale study comparing human-written versus
ChatGPT-generated argumentative student essays, we systematically assess the
quality of the AI-generated content.
Methods: A large corpus of essays was rated using standard criteria by a
large number of human experts (teachers). We augment the analysis with a
consideration of the linguistic characteristics of the generated essays.
Results: Our results demonstrate that ChatGPT generates essays that are rated
higher for quality than human-written essays. The writing style of the AI
models exhibits linguistic characteristics that are different from those of the
human-written essays, e.g., it is characterized by fewer discourse and
epistemic markers, but more nominalizations and greater lexical diversity.
Conclusions: Our results clearly demonstrate that models like ChatGPT
outperform humans in generating argumentative essays. Since the technology is
readily available for anyone to use, educators must act immediately. We must
re-invent homework and develop teaching concepts that utilize these AI models
in the same way as math utilized the calculator: teach the general concepts
first and then use AI tools to free up time for other learning objectives.
Related papers
- Hey AI Can You Grade My Essay?: Automatic Essay Grading [1.03590082373586]
We introduce a new model that outperforms the state-of-the-art models in the field of automatic essay grading (AEG)
We have used the concept of collaborative and transfer learning, where one network will be responsible for checking the grammatical and structural features of the sentences of an essay while another network is responsible for scoring the overall idea present in the essay.
Our proposed model has shown the highest accuracy of 85.50%.
arXiv Detail & Related papers (2024-10-12T01:17:55Z) - Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.
We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z) - When Automated Assessment Meets Automated Content Generation: Examining
Text Quality in the Era of GPTs [5.952677937197871]
We empirically assess the differences in how ML-based scoring models trained on human content assess the quality of content generated by humans versus GPTs.
Results of our benchmark analysis reveal that transformer pretrained language models (PLMs) more accurately score human essay quality as compared to CNN/RNN and feature-based ML methods.
arXiv Detail & Related papers (2023-09-25T19:32:18Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - ArguGPT: evaluating, understanding and identifying argumentative essays
generated by GPT models [9.483206389157509]
We first present ArguGPT, a balanced corpus of 4,038 argumentative essays generated by 7 GPT models.
We then hire English instructors to distinguish machine essays from human ones.
Results show that when first exposed to machine-generated essays, the instructors only have an accuracy of 61% in detecting them.
arXiv Detail & Related papers (2023-04-16T01:50:26Z) - Will ChatGPT get you caught? Rethinking of Plagiarism Detection [0.0]
The rise of Artificial Intelligence (AI) technology and its impact on education has been a topic of growing concern in recent years.
The use of chatbots, particularly ChatGPT, for generating academic essays has sparked fears among scholars.
This study aims to explore the originality of contents produced by one of the most popular AI chatbots, ChatGPT.
arXiv Detail & Related papers (2023-02-08T20:59:18Z) - ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine
Learning Model for Detecting Short ChatGPT-generated Text [2.0378492681344493]
We study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text.
We employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text.
Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text.
arXiv Detail & Related papers (2023-01-30T08:06:08Z) - COFFEE: Counterfactual Fairness for Personalized Text Generation in
Explainable Recommendation [56.520470678876656]
bias inherent in user written text can associate different levels of linguistic quality with users' protected attributes.
We introduce a general framework to achieve measure-specific counterfactual fairness in explanation generation.
arXiv Detail & Related papers (2022-10-14T02:29:10Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.