RFBES at SemEval-2024 Task 8: Investigating Syntactic and Semantic
Features for Distinguishing AI-Generated and Human-Written Texts
- URL: http://arxiv.org/abs/2402.14838v1
- Date: Mon, 19 Feb 2024 00:40:17 GMT
- Title: RFBES at SemEval-2024 Task 8: Investigating Syntactic and Semantic
Features for Distinguishing AI-Generated and Human-Written Texts
- Authors: Mohammad Heydari Rad, Farhan Farsi, Shayan Bali, Romina Etezadi,
Mehrnoush Shamsfard
- Abstract summary: This article investigates the problem of AI-generated text detection from two different aspects: semantics and syntax.
We present an AI model that can distinguish AI-generated texts from human-written ones with high accuracy on both multilingual and monolingual tasks.
- Score: 0.8437187555622164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, the usage of Large Language Models (LLMs) has increased, and LLMs
have been used to generate texts in different languages and for different
tasks. Additionally, due to the participation of remarkable companies such as
Google and OpenAI, LLMs are now more accessible, and people can easily use
them. However, an important issue is how we can detect AI-generated texts from
human-written ones. In this article, we have investigated the problem of
AI-generated text detection from two different aspects: semantics and syntax.
Finally, we presented an AI model that can distinguish AI-generated texts from
human-written ones with high accuracy on both multilingual and monolingual
tasks using the M4 dataset. According to our results, using a semantic approach
would be more helpful for detection. However, there is a lot of room for
improvement in the syntactic approach, and it would be a good approach for
future work.
Related papers
- Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Raidar: geneRative AI Detection viA Rewriting [42.477151044325595]
Large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting.
We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output.
Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
arXiv Detail & Related papers (2024-01-23T18:57:53Z) - HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis [14.467821652366574]
We introduce the largest benchmark for spoken texts - HANSEN (Human ANd ai Spoken tExt beNchmark)
HANSEN encompasses meticulous curation of existing speech datasets accompanied by transcripts, alongside the creation of novel AI-generated spoken text datasets.
To evaluate and demonstrate the utility of HANSEN, we perform Authorship (AA) & Author Verification (AV) on human-spoken datasets and conducted Human vs. AI spoken text detection using state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2023-10-25T16:23:17Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs)
We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z) - Generative AI Text Classification using Ensemble LLM Approaches [0.12483023446237698]
Large Language Models (LLMs) have shown impressive performance across a variety of AI and natural language processing tasks.
We propose an ensemble neural model that generates probabilities from different pre-trained LLMs.
For the first task of distinguishing between AI and human generated text, our model ranked in fifth and thirteenth place.
arXiv Detail & Related papers (2023-09-14T14:41:46Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box
Machine-Generated Text Detection [69.29017069438228]
Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries.
This has also raised concerns about the potential misuse of such texts in journalism, education, and academia.
In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse.
arXiv Detail & Related papers (2023-05-24T08:55:11Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.