TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques
- URL: http://arxiv.org/abs/2403.16592v1
- Date: Mon, 25 Mar 2024 10:09:03 GMT
- Title: TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques
- Authors: Ashok Urlana, Aditya Saibewar, Bala Mallikarjunarao Garlapati, Charaka Vinayak Kumar, Ajeet Kumar Singh, Srinivasa Rao Chalamala,
- Abstract summary: Large Language Models (LLMs) generate fluent content across a wide spectrum of user queries.
This capability has raised concerns regarding misinformation and personal information leakage.
We present our methods for the SemEval2024 Task8, aiming to detect machine-generated text across various domains.
- Score: 2.149586323955579
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Large Language Models (LLMs) exhibit remarkable ability to generate fluent content across a wide spectrum of user queries. However, this capability has raised concerns regarding misinformation and personal information leakage. In this paper, we present our methods for the SemEval2024 Task8, aiming to detect machine-generated text across various domains in both mono-lingual and multi-lingual contexts. Our study comprehensively analyzes various methods to detect machine-generated text, including statistical, neural, and pre-trained model approaches. We also detail our experimental setup and perform a in-depth error analysis to evaluate the effectiveness of these methods. Our methods obtain an accuracy of 86.9\% on the test set of subtask-A mono and 83.7\% for subtask-B. Furthermore, we also highlight the challenges and essential factors for consideration in future studies.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text [2.2039952888743253]
MGT has emerged as a significant area of study within Natural Language Processing.
In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a powerful neural architecture, to address MGT detection.
Our proposed system achieves an accuracy of 78.9% on the test dataset, positioning us at 57th among participants.
arXiv Detail & Related papers (2024-07-16T14:33:01Z) - Mast Kalandar at SemEval-2024 Task 8: On the Trail of Textual Origins: RoBERTa-BiLSTM Approach to Detect AI-Generated Text [7.959800630494841]
SemEval 2024 introduces the task of Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection.
We propose a RoBERTa-BiLSTM based classifier designed to classify text into two categories: AI-generated or human.
Our architecture ranked 46th on the official leaderboard with an accuracy of 80.83 among 125.
arXiv Detail & Related papers (2024-07-03T10:22:23Z) - SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text? [4.463184061618504]
We present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Black-Box Machine-Generated Text Detection"
Our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set.
Our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.
arXiv Detail & Related papers (2024-04-08T13:05:02Z) - KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection [0.0]
SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection.
Our submitted method achieved competitive results, ranking at the fourth place, just under 1 percentage point behind the winner.
arXiv Detail & Related papers (2024-02-21T10:09:56Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models [57.80514758695275]
Using large language models (LLMs) for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level.
We propose a new prompting method called textbftextttError Analysis Prompting (EAPrompt)
This technique emulates the commonly accepted human evaluation framework - Multidimensional Quality Metrics (MQM) and textitproduces explainable and reliable MT evaluations at both the system and segment level.
arXiv Detail & Related papers (2023-03-24T05:05:03Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.