MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection
Benchmark
- URL: http://arxiv.org/abs/2310.13606v1
- Date: Fri, 20 Oct 2023 15:57:17 GMT
- Title: MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection
Benchmark
- Authors: Dominik Macko, Robert Moro, Adaku Uchendu, Jason Samuel Lucas,
Michiharu Yamashita, Mat\'u\v{s} Pikuliak, Ivan Srba, Thai Le, Dongwon Lee,
Jakub Simko, Maria Bielikova
- Abstract summary: MultiTuDE is a novel benchmarking dataset for multilingual machine-generated text detection.
It consists of 74,081 authentic and machine-generated texts in 11 languages.
We compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors.
- Score: 10.92793962395538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a lack of research into capabilities of recent LLMs to generate
convincing text in languages other than English and into performance of
detectors of machine-generated text in multilingual settings. This is also
reflected in the available benchmarks which lack authentic texts in languages
other than English and predominantly cover older generators. To fill this gap,
we introduce MULTITuDE, a novel benchmarking dataset for multilingual
machine-generated text detection comprising of 74,081 authentic and
machine-generated texts in 11 languages (ar, ca, cs, de, en, es, nl, pt, ru,
uk, and zh) generated by 8 multilingual LLMs. Using this benchmark, we compare
the performance of zero-shot (statistical and black-box) and fine-tuned
detectors. Considering the multilinguality, we evaluate 1) how these detectors
generalize to unseen languages (linguistically similar as well as dissimilar)
and unseen LLMs and 2) whether the detectors improve their performance when
trained on multiple languages.
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts [0.6053347262128919]
MultiSocial dataset contains 472,097 texts, of which about 58k are human-written.
We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form.
Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts.
arXiv Detail & Related papers (2024-06-18T12:26:09Z) - CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts [10.027843402296678]
This paper constructs a comprehensive benchmark in both Chinese and English to evaluate mainstream AI-generated text detectors.
We categorize text generation into five distinct operations: Create, Update, Delete, Rewrite, and Translate.
For each CUDRT category, we have developed extensive datasets to thoroughly assess detector performance.
arXiv Detail & Related papers (2024-06-13T12:43:40Z) - KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection [0.0]
SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection.
Our submitted method achieved competitive results, ranking at the fourth place, just under 1 percentage point behind the winner.
arXiv Detail & Related papers (2024-02-21T10:09:56Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Revisiting non-English Text Simplification: A Unified Multilingual
Benchmark [14.891068432456262]
This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs.
Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings.
arXiv Detail & Related papers (2023-05-25T03:03:29Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - AfroMT: Pretraining Strategies and Reproducible Benchmarks for
Translation of 8 African Languages [94.75849612191546]
AfroMT is a standardized, clean, and reproducible machine translation benchmark for eight widely spoken African languages.
We develop a suite of analysis tools for system diagnosis taking into account the unique properties of these languages.
We demonstrate significant improvements when pretraining on 11 languages, with gains of up to 2 BLEU points over strong baselines.
arXiv Detail & Related papers (2021-09-10T07:45:21Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - Language Detection Engine for Multilingual Texting on Mobile Devices [0.415623340386296]
More than 2 billion mobile users worldwide type in multiple languages in the soft keyboard.
On a monolingual keyboard, 38% of falsely auto-corrected words are valid in another language.
We present a fast, light-weight and accurate Language Detection Engine (LDE) for multilingual typing.
arXiv Detail & Related papers (2021-01-07T16:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.