Fugu-MT 論文翻訳(概要): Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

論文の概要: Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

arxiv url: http://arxiv.org/abs/2603.23146v1
Date: Tue, 24 Mar 2026 12:46:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.479992
Title: Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
Title（参考訳）: AI生成テキスト検出が失敗する理由:ベンチマークの正確性を超えた説明可能なAIからの証拠
Authors: Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, Marisa Llorens Salvador,
Abstract要約: 本稿では,言語機能工学,機械学習,説明可能なAI技術を統合するフレームワークを提案する。 SHAPに基づく説明を用いて、最も影響力のある特徴がデータセットによって著しく異なることを示す。この知識は、さまざまな設定で堅牢なAI検出器を構築するのに役立ちます。
参考スコア（独自算出の注目度）: 0.9169660430821364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The widespread adoption of Large Language Models (LLMs) has made the detection of AI-Generated text a pressing and complex challenge. Although many detection systems report high benchmark accuracy, their reliability in real-world settings remains uncertain, and their interpretability is often unexplored. In this work, we investigate whether contemporary detectors genuinely identify machine authorship or merely exploit dataset-specific artefacts. We propose an interpretable detection framework that integrates linguistic feature engineering, machine learning, and explainable AI techniques. When evaluated on two prominent benchmark corpora, namely PAN CLEF 2025 and COLING 2025, our model trained on 30 linguistic features achieves leaderboard-competitive performance, attaining an F1 score of 0.9734. However, systematic cross-domain and cross-generator evaluation reveals substantial generalisation failure: classifiers that excel in-domain degrade significantly under distribution shift. Using SHAP- based explanations, we show that the most influential features differ markedly between datasets, indicating that detectors often rely on dataset-specific stylistic cues rather than stable signals of machine authorship. Further investigation with in-depth error analysis exposes a fundamental tension in linguistic-feature-based AI text detection: the features that are most discriminative on in-domain data are also the features most susceptible to domain shift, formatting variation, and text-length effects. We believe that this knowledge helps build AI detectors that are robust across different settings. To support replication and practical use, we release an open-source Python package that returns both predictions and instance-level explanations for individual texts.
Abstract（参考訳）: LLM(Large Language Models)が広く採用されているため、AI生成テキストの検出は、迫力と複雑な課題となっている。多くの検出システムは高いベンチマーク精度を報告しているが、実際の設定における信頼性は依然として不明であり、その解釈可能性はしばしば未解明である。本研究では,現代の検出器が真にマシンのオーサシップを識別しているか,あるいは単にデータセット固有のアーティファクトを利用するのかを検討する。本稿では,言語機能工学,機械学習,説明可能なAI技術を統合した解釈可能な検出フレームワークを提案する。 PAN CLEF 2025 と COING 2025 の2つのベンチマークコーパスで評価すると,30 の言語的特徴を訓練した結果,F1 スコアが 0.9734 となった。しかし、系統的クロスドメインとクロスジェネレータの評価は、分布シフトの際、ドメイン内を最適化する分類器が著しく劣化する、重大な一般化の失敗を示す。 SHAPに基づく説明を用いて、最も影響力のある特徴がデータセット間で著しく異なることを示し、検出器がしばしばマシンオーサシップの安定した信号ではなく、データセット固有のスタイル的手がかりに依存していることを示す。詳細なエラー分析によるさらなる調査は、言語機能ベースのAIテキスト検出における基本的な緊張を露呈する。ドメイン内のデータで最も識別可能な機能は、ドメインシフト、フォーマットのバリエーション、テキスト長の影響に最も影響を受けやすい機能である。この知識は、さまざまな設定で堅牢なAI検出器を構築するのに役立ちます。レプリケーションと実用的な使用をサポートするため、個々のテキストに対する予測とインスタンスレベルの説明の両方を返す、オープンソースのPythonパッケージをリリースしています。

論文の概要: Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

関連論文リスト