Fugu-MT 論文翻訳(概要): Teaching People LLM's Errors and Getting it Right

論文の概要: Teaching People LLM's Errors and Getting it Right

arxiv url: http://arxiv.org/abs/2512.21422v1
Date: Wed, 24 Dec 2025 20:53:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-29 20:48:41.793008
Title: Teaching People LLM's Errors and Getting it Right
Title（参考訳）: LLMの誤りを教え、それを正しくする
Authors: Nathan Stringham, Fateme Hashemi Chaleshtori, Xinyuan Yan, Zhichao Xu, Bei Wang, Ana Marasović,
Abstract要約: 人々は、すべきでないときに大きな言語モデル(LLM)を使用します。以前の作業では、LLMが失敗する可能性のあるリージョンにインスタンスの埋め込みをクラスタ化することで、この問題に対処しようとしていた。見つかった障害パターンは、過剰な信頼性を軽減するためにユーザに教えられます。
参考スコア（独自算出の注目度）: 5.213248158569623
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: People use large language models (LLMs) when they should not. This is partly because they see LLMs compose poems and answer intricate questions, so they understandably, but incorrectly, assume LLMs won't stumble on basic tasks like simple arithmetic. Prior work has tried to address this by clustering instance embeddings into regions where an LLM is likely to fail and automatically describing patterns in these regions. The found failure patterns are taught to users to mitigate their overreliance. Yet, this approach has not fully succeeded. In this analysis paper, we aim to understand why. We first examine whether the negative result stems from the absence of failure patterns. We group instances in two datasets by their meta-labels and evaluate an LLM's predictions on these groups. We then define criteria to flag groups that are sizable and where the LLM is error-prone, and find meta-label groups that meet these criteria. Their meta-labels are the LLM's failure patterns that could be taught to users, so they do exist. We next test whether prompting and embedding-based approaches can surface these known failures. Without this, users cannot be taught about them to reduce their overreliance. We find mixed results across methods, which could explain the negative result. Finally, we revisit the final metric that measures teaching effectiveness. We propose to assess a user's ability to effectively use the given failure patterns to anticipate when an LLM is error-prone. A user study shows a positive effect from teaching with this metric, unlike the human-AI team accuracy. Our findings show that teaching failure patterns could be a viable approach to mitigating overreliance, but success depends on better automated failure-discovery methods and using metrics like ours.
Abstract（参考訳）: 人々は、すべきでないときに大きな言語モデル(LLM)を使用します。これは、LLMが詩を作成し、複雑な質問に答えるのを見るためである。以前の作業では、LLMが失敗する可能性のあるリージョンにインスタンスの埋め込みをクラスタ化して、これらのリージョンのパターンを自動記述することで、この問題に対処しようとしていた。見つかった障害パターンは、過剰な信頼性を軽減するためにユーザに教えられます。しかし、このアプローチは完全には成功していない。本稿では,その理由を理解することを目的としている。まず、負の結果が失敗パターンの欠如に起因するかどうかを検討する。 2つのデータセットのインスタンスをメタラベルでグループ化し、これらのグループでLCMの予測を評価する。次に、LLMがエラーを起こしやすいグループをフラグする基準を定義し、これらの基準を満たすメタラベルグループを見つける。彼らのメタラベルはLSMの失敗パターンであり、ユーザに教えられるので、それらは存在する。次に、プロンプトと埋め込みベースのアプローチが既知の障害を表面化するかどうかをテストする。これなしでは、ユーザーは過度な信頼を減らそうと教えられません。提案手法間の混合結果が得られ, 負の結果が説明できる。最後に,教育効果を計測する最終指標を再考する。 LLMがエラーを起こしやすい場合に、与えられた障害パターンを効果的に利用し、予測するユーザの能力を評価することを提案する。ユーザスタディでは、人間-AIチームの正確さとは異なり、このメトリクスで教えることによるポジティブな効果が示されています。私たちの調査によると、障害パターンを教えることは、過信を緩和するための実行可能なアプローチである可能性があるが、成功は、より優れた自動障害発見方法と、私たちのようなメトリクスの使用に依存している。

論文の概要: Teaching People LLM's Errors and Getting it Right

関連論文リスト