Fugu-MT 論文翻訳(概要): Inverse Turing Bench: Evaluating Language Models as Judges of Human vs. AI Dialogue

論文の概要: Inverse Turing Bench: Evaluating Language Models as Judges of Human vs. AI Dialogue

arxiv url: http://arxiv.org/abs/2606.21844v1
Date: Sat, 20 Jun 2026 02:47:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 02:48:23.743694
Title: Inverse Turing Bench: Evaluating Language Models as Judges of Human vs. AI Dialogue
Title（参考訳）: 逆チューリングベンチ:人間対AI対話の判断としての言語モデルの評価
Authors: William Hager, Ishika Rathi, Masum Hasan, Cameron Jones,
Abstract要約: Inverse Turing Benchは、人間とAIをマルチターンテキストで区別する能力のモデルを評価するベンチマークである。このベンチマークでは、ペア化された対話書き起こしの集合が提供されており、1つの対話は2人の人間、もう1つは人間とAIの間の対話である。その結果,GPTZero,Claude Opus-4.6,GPT-5.5が精度が高いことがわかった。
参考スコア（独自算出の注目度）: 1.1305136905804842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI systems integrate into online spaces, differentiating them from humans in conversations is increasingly important. We present Inverse Turing Bench, a benchmark that evaluates LLMs and other models on their ability to differentiate humans and AI in multi-turn text. The benchmark provides a collection of paired dialogue transcripts, wherein one dialogue is between two humans and the other is between a human and an AI. The task is to correctly identify which dialogue is human-only vs. human-AI. We evaluated a preliminary set of models against this benchmark, and found that GPTZero, Claude Opus-4.6, and GPT-5.5 achieve the highest accuracy: 89.41%, 77.92%, and 75.94% respectively. Our results suggest that statistical approaches to detection have semantic blind spots, but semantic approaches are susceptible to persona-prompting. Our work speaks to the Inverse Turing Test as a probe of LLM theory of mind, and motivates human-AI differentiation as a critical capability for AI systems. Our live benchmark can be found at https://huggingface.co/spaces/roc-hci/Inverse-Turing-Bench-Leaderboard (anonymity preserved).
Abstract（参考訳）: AIシステムがオンライン空間に統合されるにつれて、会話の中で人間と区別することがますます重要である。逆チューリングベンチ(Inverse Turing Bench)は、マルチターンテキストで人間とAIを区別する能力について、LSMやその他のモデルを評価するベンチマークである。このベンチマークでは、ペア化された対話書き起こしの集合が提供されており、1つの対話は2人の人間、もう1つは人間とAIの間の対話である。課題は、どの対話が人間のみか、人間対AIかを正確に識別することである。 GPTZero, Claude Opus-4.6, GPT-5.5がそれぞれ89.41%, 77.92%, 75.94%という高い精度を達成した。以上の結果から,検出に対する統計的アプローチは意味的盲点を持つが,意味論的アプローチはペルソナ・プロンプトの影響を受けやすいことが示唆された。我々の研究は、逆チューリングテスト(Inverse Turing Test)を、LLMの心の理論のプローブとして取り上げ、AIシステムにとって重要な能力として、人間とAIの差別化を動機付けている。ライブベンチマークはhttps://huggingface.co/spaces/roc-hci/Inverse-Turing-Bench-Leaderboard(匿名保存)で確認できます。

論文の概要: Inverse Turing Bench: Evaluating Language Models as Judges of Human vs. AI Dialogue

関連論文リスト