Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool
- URL: http://arxiv.org/abs/2407.03646v2
- Date: Thu, 11 Jul 2024 10:56:01 GMT
- Title: Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool
- Authors: Georgios P. Georgiou,
- Abstract summary: This study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing.
Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While extensive research has focused on ChatGPT in recent years, very few studies have systematically quantified and compared linguistic features between human-written and Artificial Intelligence (AI)-generated language. This study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing. Using human-authored essays as a benchmark, we prompted ChatGPT to generate essays of equivalent length. These texts were analyzed using Open Brain AI, an online computational tool, to extract measures of phonological, morphological, syntactic, and lexical constituents. Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features such as consonants, word stress, nouns, verbs, pronouns, direct objects, prepositional modifiers, and use of difficult words among others. These findings underscore the importance of integrating automated tools for efficient language assessment, reducing time and effort in data analysis. Moreover, they emphasize the necessity for enhanced training methodologies to improve the capacity of AI for producing more human-like text.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis [0.0]
This research explores the nuanced differences in texts produced by AI and those written by humans.
The study investigates various linguistic traits, patterns of creativity, and potential biases inherent in human-written and AI- generated texts.
arXiv Detail & Related papers (2024-07-15T18:09:03Z) - Who Writes the Review, Human or AI? [0.36498648388765503]
This study proposes a methodology to accurately distinguish AI-generated and human-written book reviews.
Our approach utilizes transfer learning, enabling the model to identify generated text across different topics.
The experimental results demonstrate that it is feasible to detect the original source of text, achieving an accuracy rate of 96.86%.
arXiv Detail & Related papers (2024-05-30T17:38:44Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning [57.29285473727107]
ChatHuman is a language-driven human understanding system.
It combines and integrates the skills of many different methods.
ChatHuman is a step towards consolidating diverse methods for human analysis into a single, powerful, system for 3D human reasoning.
arXiv Detail & Related papers (2024-05-07T17:59:31Z) - Is English the New Programming Language? How About Pseudo-code Engineering? [0.0]
This study investigates how different input forms impact ChatGPT, a leading language model by OpenAI.
It examines the model's proficiency across four categories: understanding of intentions, interpretability, completeness, and creativity.
arXiv Detail & Related papers (2024-04-08T16:28:52Z) - Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing
AI-Generated Text [0.0]
My research investigates the use of cutting-edge hybrid deep learning models to accurately differentiate between AI-generated text and human writing.
I applied a robust methodology, utilising a carefully selected dataset comprising AI and human texts from various sources, each tagged with instructions.
arXiv Detail & Related papers (2023-11-27T06:26:53Z) - Improving Mandarin Prosodic Structure Prediction with Multi-level
Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP)
Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.