EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues
- URL: http://arxiv.org/abs/2508.18715v1
- Date: Tue, 26 Aug 2025 06:27:10 GMT
- Title: EMMM, Explain Me My Model! Explainable Machine Generated Text Detection in Dialogues
- Authors: Angela Yifei Yuan, Haoyi Li, Soyeon Caren Han, Christopher Leckie,
- Abstract summary: Current machine-generated text (MGT) detection methods struggle in online conversational settings.<n>In customer service scenarios where operators are typically non-expert users, explanation become crucial for trustworthy MGT detection.<n>We propose EMMM, an explanation-then-detection framework that balances latency, accuracy, and non-expert-oriented interpretability.
- Score: 18.279628627710107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid adoption of large language models (LLMs) in customer service introduces new risks, as malicious actors can exploit them to conduct large-scale user impersonation through machine-generated text (MGT). Current MGT detection methods often struggle in online conversational settings, reducing the reliability and interpretability essential for trustworthy AI deployment. In customer service scenarios where operators are typically non-expert users, explanation become crucial for trustworthy MGT detection. In this paper, we propose EMMM, an explanation-then-detection framework that balances latency, accuracy, and non-expert-oriented interpretability. Experimental results demonstrate that EMMM provides explanations accessible to non-expert users, with 70\% of human evaluators preferring its outputs, while achieving competitive accuracy compared to state-of-the-art models and maintaining low latency, generating outputs within 1 second. Our code and dataset are open-sourced at https://github.com/AngieYYF/EMMM-explainable-chatbot-detection.
Related papers
- IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation [49.796717294455796]
We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
arXiv Detail & Related papers (2026-02-26T07:21:02Z) - Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection [105.14032334647932]
Machine-generated texts (MGTs) pose risks such as disinformation and phishing, highlighting the need for reliable detection.<n> Metric-based methods, which extract statistically distinguishable features of MGTs, are often more practical than complex model-based methods that are prone to overfitting.<n>We propose a Markov-informed score calibration strategy that models two relationships of context detection scores that may aid calibration.
arXiv Detail & Related papers (2026-02-08T16:06:12Z) - From Transcripts to AI Agents: Knowledge Extraction, RAG Integration, and Robust Evaluation of Conversational AI Assistants [0.0]
Building reliable conversational AI assistants for customer-facing industries remains challenging due to noisy conversational data, fragmented knowledge, and the requirement for accurate human hand-off.<n>This paper presents an end-to-end framework for constructing and evaluating a conversational AI assistant directly from historical call transcripts.
arXiv Detail & Related papers (2026-01-26T07:44:47Z) - Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability [10.607081850023286]
We introduce the Input Scrutiny Ability Evaluation Framework (ISEval), which encompasses seven categories of flawed premises and three evaluation metrics.<n>Most models struggle to actively detect flawed textual premises without guidance.<n>These insights underscore the urgent need to enhance LMMs' proactive verification of input validity.
arXiv Detail & Related papers (2025-08-06T02:13:46Z) - Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors [4.7713095161046555]
We present a pipeline to test the resilience of state-of-the-art MGT detectors to linguistically informed adversarial attacks.<n>We fine-tune language models to shift the MGT style toward human-written text (HWT)<n>This exploits the detectors' reliance on stylistic clues, making new generations more challenging to detect.
arXiv Detail & Related papers (2025-05-30T12:33:30Z) - MAD-Sherlock: Multi-Agent Debate for Visual Misinformation Detection [36.12673167913763]
We introduce MAD-Sherlock, a multi-agent debate system for out-of-context misinformation detection.<n> MAD-Sherlock frames detection as a multi-agent debate, reflecting the diverse and conflicting discourse found online.<n>Our framework is domain- and time-agnostic, requiring no finetuning, yet achieves state-of-the-art accuracy with in-depth explanations.
arXiv Detail & Related papers (2024-10-26T10:34:22Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.<n>We collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts.<n>Our results indicate that edit distance exhibits the highest correlation with the online metric, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated.<n>We present LLM-DetectAIve, designed for fine-grained detection.<n>It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z) - F-LMM: Grounding Frozen Large Multimodal Models [53.8059045627934]
We present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations.<n>It is based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention mechanism of well-trained LMMs.<n>It achieves competitive performance on referring expression segmentation and panoptic narrative grounding benchmarks.
arXiv Detail & Related papers (2024-06-09T15:14:26Z) - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.