Related papers: OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution

OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution

URL: http://arxiv.org/abs/2504.11369v1
Date: Tue, 15 Apr 2025 16:36:14 GMT
Title: OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution
Authors: Lucio La Cava, Andrea Tagarelli,
Abstract summary: Open Large Language Models (OLLMs) are increasingly leveraged in generative AI applications.<n>We propose OpenTuringBench, a new benchmark based on OLLMs to train and evaluate machine-generated text detectors.
Score: 4.742123770879715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open Large Language Models (OLLMs) are increasingly leveraged in generative AI applications, posing new challenges for detecting their outputs. We propose OpenTuringBench, a new benchmark based on OLLMs, designed to train and evaluate machine-generated text detectors on the Turing Test and Authorship Attribution problems. OpenTuringBench focuses on a representative set of OLLMs, and features a number of challenging evaluation tasks, including human/machine-manipulated texts, out-of-domain texts, and texts from previously unseen models. We also provide OTBDetector, a contrastive learning framework to detect and attribute OLLM-based machine-generated texts. Results highlight the relevance and varying degrees of difficulty of the OpenTuringBench tasks, with our detector achieving remarkable capabilities across the various tasks and outperforming most existing detectors. Resources are available on the OpenTuringBench Hugging Face repository at https://huggingface.co/datasets/MLNTeam-Unical/OpenTuringBench

Related papers

GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck. Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts. Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z)
RKadiyala at SemEval-2024 Task 8: Black-Box Word-Level Text Boundary Detection in Partially Machine Generated Texts [0.0]
This paper introduces few reliable approaches for identifying which part of a given text is machine generated at a word level. We present a comparison with proprietary systems, performance of our model on unseen domains' and generators' texts. The findings reveal significant improvements in detection accuracy along with comparison on other aspects of detection capabilities.
arXiv Detail & Related papers (2024-10-22T03:21:59Z)
Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
Making Text Embedders Few-Shot Learners [33.50993377494602]
We introduce a novel model bge-en-icl, which employs few-shot examples to produce high-quality text embeddings. Our approach integrates task-related examples directly into the query side, resulting in significant improvements across various tasks. Experimental results on the MTEB and AIR-Bench benchmarks demonstrate that our approach sets new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-09-24T03:30:19Z)
Marking: Visual Grading with Highlighting Errors and Annotating Missing Bits [23.71250100390303]
"Marking" is a novel grading task that enhances automated grading systems by performing an in-depth analysis of student responses. We introduce a new dataset meticulously curated by Subject Matter Experts specifically for this task.
arXiv Detail & Related papers (2024-04-22T16:00:46Z)
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z)
MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection. We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z)
MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding [70.16678926775475]
MMOCR is an open-source toolbox for text detection and recognition. It implements 14 state-of-the-art algorithms, which is more than all the existing open-source OCR projects we are aware of to date.
arXiv Detail & Related papers (2021-08-14T14:10:23Z)
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation [83.10599735938618]
Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. This work introduces GENIE, an human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks.
arXiv Detail & Related papers (2021-01-17T00:40:47Z)
RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text [25.80571756447762]
We present Real or Fake Text (RoFT), a website that invites users to try their hand at detecting machine-generated text. We show preliminary results of using RoFT to evaluate detection of machine-generated news articles.
arXiv Detail & Related papers (2020-10-06T22:47:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.