Overview of AuTexTification at IberLEF 2023: Detection and Attribution
of Machine-Generated Text in Multiple Domains
- URL: http://arxiv.org/abs/2309.11285v1
- Date: Wed, 20 Sep 2023 13:10:06 GMT
- Title: Overview of AuTexTification at IberLEF 2023: Detection and Attribution
of Machine-Generated Text in Multiple Domains
- Authors: Areg Mikael Sarvazyan, Jos\'e \'Angel Gonz\'alez, Marc
Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso
- Abstract summary: This paper presents the overview of the AuTexTification task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum.
Our AuTexTification dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles)
A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes.
- Score: 6.44756483013808
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents the overview of the AuTexTification shared task as part
of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the
framework of the SEPLN 2023 conference. AuTexTification consists of two
subtasks: for Subtask 1, participants had to determine whether a text is
human-authored or has been generated by a large language model. For Subtask 2,
participants had to attribute a machine-generated text to one of six different
text generation models. Our AuTexTification 2023 dataset contains more than
160.000 texts across two languages (English and Spanish) and five domains
(tweets, reviews, news, legal, and how-to articles). A total of 114 teams
signed up to participate, of which 36 sent 175 runs, and 20 of them sent their
working notes. In this overview, we present the AuTexTification dataset and
task, the submitted participating systems, and the results.
Related papers
- SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages [71.50809576484288]
Text detoxification is a task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register.
Recent approaches for parallel text detoxification corpora collection -- ParaDetox and APPADIA -- were explored only in monolingual setup.
In this work, we aim to extend ParaDetox pipeline to multiple languages presenting MultiParaDetox to automate parallel detoxification corpus collection for potentially any language.
arXiv Detail & Related papers (2024-04-02T15:32:32Z) - Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection
in Arabic Text [41.3267575540348]
We present an overview of the ArAIEval shared task, organized as part of the first Arabic 2023 conference co-located with EMNLP 2023.
ArAIEval offers two tasks over Arabic text: (i) persuasion technique detection, focusing on identifying persuasion techniques in tweets and news articles, and (ii) disinformation detection in binary and multiclass setups over tweets.
A total of 20 teams participated in the final evaluation phase, with 14 and 16 teams participating in Tasks 1 and 2, respectively.
arXiv Detail & Related papers (2023-11-06T15:21:19Z) - Legend at ArAIEval Shared Task: Persuasion Technique Detection using a
Language-Agnostic Text Representation Model [1.3506669466260708]
In this paper, we share our best performing submission to the Arabic AI Tasks Evaluation Challenge (ArAIEval) at ArabicNLP 2023.
Our focus was on Task 1, which involves identifying persuasion techniques in excerpts from tweets and news articles.
The persuasion technique in Arabic texts was detected using a training loop with XLM-RoBERTa, a language-agnostic text representation model.
arXiv Detail & Related papers (2023-10-14T20:27:04Z) - UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text
using Transformer Ensembles [0.5324802812881543]
This paper describes the solutions submitted by the UPB team to the AuTexTification shared task, featured as part of IberLEF-2023.
Our best-performing model achieved macro F1-scores of 66.63% on the English dataset and 67.10% on the Spanish dataset.
arXiv Detail & Related papers (2023-08-02T20:08:59Z) - ICDAR 2023 Video Text Reading Competition for Dense and Small Text [61.138557702185274]
We establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)
arXiv Detail & Related papers (2023-04-10T04:20:34Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Findings of the The RuATD Shared Task 2022 on Artificial Text Detection
in Russian [6.9244605050142995]
We present the shared task on artificial text detection in Russian, which is organized as a part of the Dialogue Evaluation initiative, held in 2022.
The dataset includes texts from 14 text generators, i.e., one human writer and 13 text generative models fine-tuned for one or more of the following generation tasks.
The human-written texts are collected from publicly available resources across multiple domains.
arXiv Detail & Related papers (2022-06-03T14:12:33Z) - SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual
Media [50.29389719723529]
We present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media.
The goal of this shared task is to design automatic methods for emphasis selection.
The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used.
arXiv Detail & Related papers (2020-08-07T17:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.