Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid
Essay in Education
- URL: http://arxiv.org/abs/2307.12267v6
- Date: Mon, 25 Dec 2023 06:36:30 GMT
- Title: Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid
Essay in Education
- Authors: Zijie Zeng, Lele Sha, Yuheng Li, Kaixun Yang, Dragan Ga\v{s}evi\'c,
Guanliang Chen
- Abstract summary: This study investigates AI content detection in a rarely explored yet realistic setting.
We first formalized the detection task as identifying the transition points between human-written content and AI-generated content.
We then proposed a two-step approach where we separated AI-generated content from human-written content during the encoder training process.
- Score: 10.606131520965604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent large language models (LLMs), e.g., ChatGPT, have been able to
generate human-like and fluent responses when provided with specific
instructions. While admitting the convenience brought by technological
advancement, educators also have concerns that students might leverage LLMs to
complete their writing assignments and pass them off as their original work.
Although many AI content detection studies have been conducted as a result of
such concerns, most of these prior studies modeled AI content detection as a
classification problem, assuming that a text is either entirely human-written
or entirely AI-generated. In this study, we investigated AI content detection
in a rarely explored yet realistic setting where the text to be detected is
collaboratively written by human and generative LLMs (i.e., hybrid text). We
first formalized the detection task as identifying the transition points
between human-written content and AI-generated content from a given hybrid text
(boundary detection). Then we proposed a two-step approach where we (1)
separated AI-generated content from human-written content during the encoder
training process; and (2) calculated the distances between every two adjacent
prototypes and assumed that the boundaries exist between the two adjacent
prototypes that have the furthest distance from each other. Through extensive
experiments, we observed the following main findings: (1) the proposed approach
consistently outperformed the baseline methods across different experiment
settings; (2) the encoder training process can significantly boost the
performance of the proposed approach; (3) when detecting boundaries for
single-boundary hybrid essays, the proposed approach could be enhanced by
adopting a relatively large prototype size, leading to a 22% improvement in the
In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation.
Related papers
- GigaCheck: Detecting LLM-generated Content [72.27323884094953]
In this work, we investigate the task of generated text detection by proposing the GigaCheck.
Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts.
Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize artificially generated intervals within text.
arXiv Detail & Related papers (2024-10-31T08:30:55Z) - DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text [4.902089836908786]
WhosAI is a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI.
We show that our proposed framework achieves outstanding results in both the Turing Test and Authorship tasks.
arXiv Detail & Related papers (2024-07-12T15:44:56Z) - Who Writes the Review, Human or AI? [0.36498648388765503]
This study proposes a methodology to accurately distinguish AI-generated and human-written book reviews.
Our approach utilizes transfer learning, enabling the model to identify generated text across different topics.
The experimental results demonstrate that it is feasible to detect the original source of text, achieving an accuracy rate of 96.86%.
arXiv Detail & Related papers (2024-05-30T17:38:44Z) - Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights [18.30412155877708]
This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts.
The CoAuthor dataset includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system.
arXiv Detail & Related papers (2024-03-06T07:25:46Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.