Related papers: Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education

Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education

URL: http://arxiv.org/abs/2307.12267v6
Date: Mon, 25 Dec 2023 06:36:30 GMT
Title: Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education
Authors: Zijie Zeng, Lele Sha, Yuheng Li, Kaixun Yang, Dragan Ga\v{s}evi\'c, Guanliang Chen
Abstract summary: This study investigates AI content detection in a rarely explored yet realistic setting. We first formalized the detection task as identifying the transition points between human-written content and AI-generated content. We then proposed a two-step approach where we separated AI-generated content from human-written content during the encoder training process.
Score: 10.606131520965604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent large language models (LLMs), e.g., ChatGPT, have been able to generate human-like and fluent responses when provided with specific instructions. While admitting the convenience brought by technological advancement, educators also have concerns that students might leverage LLMs to complete their writing assignments and pass them off as their original work. Although many AI content detection studies have been conducted as a result of such concerns, most of these prior studies modeled AI content detection as a classification problem, assuming that a text is either entirely human-written or entirely AI-generated. In this study, we investigated AI content detection in a rarely explored yet realistic setting where the text to be detected is collaboratively written by human and generative LLMs (i.e., hybrid text). We first formalized the detection task as identifying the transition points between human-written content and AI-generated content from a given hybrid text (boundary detection). Then we proposed a two-step approach where we (1) separated AI-generated content from human-written content during the encoder training process; and (2) calculated the distances between every two adjacent prototypes and assumed that the boundaries exist between the two adjacent prototypes that have the furthest distance from each other. Through extensive experiments, we observed the following main findings: (1) the proposed approach consistently outperformed the baseline methods across different experiment settings; (2) the encoder training process can significantly boost the performance of the proposed approach; (3) when detecting boundaries for single-boundary hybrid essays, the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a 22% improvement in the In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation.

Related papers

HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring [14.887491317701997]
This paper explores the possibility of fine-grained MGT detection under human-AI coauthoring.<n>We suggest fine-grained detectors can pave pathways toward coauthored text detection with a numeric AI ratio.<n> Empirical results show that metric-based methods struggle to conduct fine-grained detection with a 0.462 average F1 score.
arXiv Detail & Related papers (2025-06-03T14:52:44Z)
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions. We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process. We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck. Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts. Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z)
DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors. We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework. Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z)
Automating Intervention Discovery from Scientific Literature: A Progressive Ontology Prompting and Dual-LLM Framework [56.858564736806414]
This paper proposes a novel framework leveraging large language models (LLMs) to identify interventions in scientific literature.<n>Our approach successfully identified 2,421 interventions from a corpus of 64,177 research articles in the speech-language pathology domain.
arXiv Detail & Related papers (2024-08-20T16:42:23Z)
Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text [4.902089836908786]
WhosAI is a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI. We show that our proposed framework achieves outstanding results in both the Turing Test and Authorship tasks.
arXiv Detail & Related papers (2024-07-12T15:44:56Z)
Who Writes the Review, Human or AI? [0.36498648388765503]
This study proposes a methodology to accurately distinguish AI-generated and human-written book reviews. Our approach utilizes transfer learning, enabling the model to identify generated text across different topics. The experimental results demonstrate that it is feasible to detect the original source of text, achieving an accuracy rate of 96.86%.
arXiv Detail & Related papers (2024-05-30T17:38:44Z)
Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights [18.30412155877708]
This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. The CoAuthor dataset includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system.
arXiv Detail & Related papers (2024-03-06T07:25:46Z)
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z)
DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content. Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z)
Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z)
On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases. We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z)
Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field. This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation. Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.