GenAI Content Detection Task 2: AI vs. Human -- Academic Essay Authenticity Challenge
- URL: http://arxiv.org/abs/2412.18274v1
- Date: Tue, 24 Dec 2024 08:33:44 GMT
- Title: GenAI Content Detection Task 2: AI vs. Human -- Academic Essay Authenticity Challenge
- Authors: Shammur Absar Chowdhury, Hind Almerekhi, Mucahid Kutlu, Kaan Efe Keles, Fatema Ahmad, Tasnim Mohiuddin, George Mikros, Firoj Alam,
- Abstract summary: The Academic Essay Authenticity Challenge was organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025.
This challenge focuses on detecting machine-generated vs. human-authored essays for academic purposes.
The challenge involves two languages: English and Arabic.
This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework.
- Score: 12.076440946525434
- License:
- Abstract: This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs. human-authored essays for academic purposes. The task is defined as follows: "Given an essay, identify whether it is generated by a machine or authored by a human.'' The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, seven teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.
Related papers
- GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human [71.42669028683741]
We present a shared task on binary machine generated text detection conducted as a part of the GenAI workshop at COLING 2025.
The task consists of two subtasks: Monolingual (English) and Multilingual.
We provide a comprehensive overview of the data, a summary of the results, detailed descriptions of the participating systems, and an in-depth analysis of submissions.
arXiv Detail & Related papers (2025-01-19T11:11:55Z) - IntegrityAI at GenAI Detection Task 2: Detecting Machine-Generated Academic Essays in English and Arabic Using ELECTRA and Stylometry [0.21756081703275998]
This research utilizes pre-trained, transformer-based models fine-tuned on Arabic and English academic essays with stylometric features.
Proposed models achieved excellent results with an F1-score of 99.7%, ranking 2nd among 26 teams in the English subtask, and 98.4%, finishing 1st out of 23 teams in the Arabic one.
arXiv Detail & Related papers (2025-01-07T10:19:56Z) - ArAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content [9.287041393988485]
We present an overview of the second edition of the ArAIEval shared task, organized as part of the Arabic 2024 conference co-located with ACL 2024.
In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes.
A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively.
arXiv Detail & Related papers (2024-07-05T04:28:46Z) - ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection
in Arabic Text [41.3267575540348]
We present an overview of the ArAIEval shared task, organized as part of the first Arabic 2023 conference co-located with EMNLP 2023.
ArAIEval offers two tasks over Arabic text: (i) persuasion technique detection, focusing on identifying persuasion techniques in tweets and news articles, and (ii) disinformation detection in binary and multiclass setups over tweets.
A total of 20 teams participated in the final evaluation phase, with 14 and 16 teams participating in Tasks 1 and 2, respectively.
arXiv Detail & Related papers (2023-11-06T15:21:19Z) - SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL)
SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z) - Legend at ArAIEval Shared Task: Persuasion Technique Detection using a
Language-Agnostic Text Representation Model [1.3506669466260708]
In this paper, we share our best performing submission to the Arabic AI Tasks Evaluation Challenge (ArAIEval) at ArabicNLP 2023.
Our focus was on Task 1, which involves identifying persuasion techniques in excerpts from tweets and news articles.
The persuasion technique in Arabic texts was detected using a training loop with XLM-RoBERTa, a language-agnostic text representation model.
arXiv Detail & Related papers (2023-10-14T20:27:04Z) - FacTool: Factuality Detection in Generative AI -- A Tool Augmented
Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models.
We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Summarization with Graphical Elements [55.5913491389047]
We propose a new task: summarization with graphical elements.
We collect a high quality human labeled dataset to support research into the task.
arXiv Detail & Related papers (2022-04-15T17:16:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.