Deep dive into language traits of AI-generated Abstracts
- URL: http://arxiv.org/abs/2312.10617v1
- Date: Sun, 17 Dec 2023 06:03:33 GMT
- Title: Deep dive into language traits of AI-generated Abstracts
- Authors: Vikas Kumar, Amisha Bharti, Devanshu Verma, Vasudha Bhatnagar
- Abstract summary: Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing.
In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded.
We extract the texts semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.
- Score: 5.209583971923267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative language models, such as ChatGPT, have garnered attention for
their ability to generate human-like writing in various fields, including
academic research. The rapid proliferation of generated texts has bolstered the
need for automatic identification to uphold transparency and trust in the
information. However, these generated texts closely resemble human writing and
often have subtle differences in the grammatical structure, tones, and
patterns, which makes systematic scrutinization challenging. In this work, we
attempt to detect the Abstracts generated by ChatGPT, which are much shorter in
length and bounded. We extract the texts semantic and lexical properties and
observe that traditional machine learning models can confidently detect these
Abstracts.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs [19.073560504913356]
The line between human-crafted and machine-generated texts has become increasingly blurred.
This paper delves into the inquiry of identifying discernible and unique linguistic properties in texts that were written by humans.
arXiv Detail & Related papers (2024-02-16T11:20:30Z) - AI-generated text boundary detection with RoFT [7.2286849324485445]
We study how to detect the boundary between human-written and machine-generated parts of texts.
In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model.
arXiv Detail & Related papers (2023-11-14T17:48:19Z) - DetectGPT-SC: Improving Detection of Text Generated by Large Language
Models through Self-Consistency with Masked Predictions [13.077729125193434]
Existing detectors are built on the assumption that there is a distribution gap between human-generated and AI-generated texts.
We find that large language models such as ChatGPT exhibit strong self-consistency in text generation and continuation.
We propose a new method for AI-generated texts detection based on self-consistency with masked predictions.
arXiv Detail & Related papers (2023-10-23T01:23:10Z) - Automatic and Human-AI Interactive Text Generation [27.05024520190722]
This tutorial aims to provide an overview of the state-of-the-art natural language generation research.
Text-to-text generation tasks are more constrained in terms of semantic consistency and targeted language styles.
arXiv Detail & Related papers (2023-10-05T20:26:15Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.