DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
- URL: http://arxiv.org/abs/2510.17489v1
- Date: Mon, 20 Oct 2025 12:41:44 GMT
- Title: DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning
- Authors: Yongxin He, Shan Zhang, Yixuan Cao, Lei Ma, Ping Luo,
- Abstract summary: Detecting AI-involved text is essential for combating misinformation, plagiarism, and academic misconduct.<n>Current methods model these processes rather crudely, primarily employing binary classification.<n>We propose DETree, a novel approach that models the relationships among different processes as a Hierarchical Affinity Tree structure.
- Score: 31.444137536002955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting AI-involved text is essential for combating misinformation, plagiarism, and academic misconduct. However, AI text generation includes diverse collaborative processes (AI-written text edited by humans, human-written text edited by AI, and AI-generated text refined by other AI), where various or even new LLMs could be involved. Texts generated through these varied processes exhibit complex characteristics, presenting significant challenges for detection. Current methods model these processes rather crudely, primarily employing binary classification (purely human vs. AI-involved) or multi-classification (treating human-AI collaboration as a new class). We observe that representations of texts generated through different processes exhibit inherent clustering relationships. Therefore, we propose DETree, a novel approach that models the relationships among different processes as a Hierarchical Affinity Tree structure, and introduces a specialized loss function that aligns text representations with this tree. To facilitate this learning, we developed RealBench, a comprehensive benchmark dataset that automatically incorporates a wide spectrum of hybrid texts produced through various human-AI collaboration processes. Our method improves performance in hybrid text detection tasks and significantly enhances robustness and generalization in out-of-distribution scenarios, particularly in few-shot learning conditions, further demonstrating the promise of training-based approaches in OOD settings. Our code and dataset are available at https://github.com/heyongxin233/DETree.
Related papers
- ChatGpt Content detection: A new approach using xlm-roberta alignment [0.0]
We present a comprehensive methodology to detect AI-generated text using XLM-RoBERTa, a state-of-the-art multilingual transformer model.<n>We fine-tuned the model on a balanced dataset of human and AI-generated texts and evaluated its performance.<n>Our findings offer a valuable tool for maintaining academic integrity and contribute to the broader field of AI ethics.
arXiv Detail & Related papers (2025-11-26T03:16:57Z) - Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z) - A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models [71.66119575697458]
parallel text generation techniques aimed at breaking the token-by-token generation bottleneck and improving inference efficiency.<n>We categorize existing approaches into AR-based and Non-AR-based paradigms, and provide a detailed examination of the core techniques within each category.<n>We highlight recent advancements, identify open challenges, and outline promising directions for future research in parallel text generation.
arXiv Detail & Related papers (2025-08-12T07:56:04Z) - DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text [4.902089836908786]
WhosAI is a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI.<n>We show that our proposed framework achieves outstanding results in both the Turing Test and Authorship tasks.
arXiv Detail & Related papers (2024-07-12T15:44:56Z) - Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution.<n>It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing
AI-Generated Text [0.0]
My research investigates the use of cutting-edge hybrid deep learning models to accurately differentiate between AI-generated text and human writing.
I applied a robust methodology, utilising a carefully selected dataset comprising AI and human texts from various sources, each tagged with instructions.
arXiv Detail & Related papers (2023-11-27T06:26:53Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Robust Conversational AI with Grounded Text Generation [77.56950706340767]
GTG is a hybrid model which uses a large-scale Transformer neural network as its backbone.
It generates responses grounded in dialog belief state and real-world knowledge for task completion.
arXiv Detail & Related papers (2020-09-07T23:49:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.