Fugu-MT 論文翻訳(概要): Are Humans as Brittle as Large Language Models?

論文の概要: Are Humans as Brittle as Large Language Models?

arxiv url: http://arxiv.org/abs/2509.07869v1
Date: Tue, 09 Sep 2025 15:56:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 14:38:27.387162
Title: Are Humans as Brittle as Large Language Models?
Title（参考訳）: 人間は大きな言語モデルと同じくらい脆弱か?
Authors: Jiahui Li, Sean Papay, Roman Klinger,
Abstract要約: 提案手法は,大型言語モデル (LLM) に対する即時修正の効果と,ヒトアノテータに対する同一命令修正との比較である。以上の結果から,ヒトとLDMの両者は,特定の種類の即時修飾に応答して脆度が増大したことが示唆された。
参考スコア（独自算出の注目度）: 9.467418013202282
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The output of large language models (LLM) is unstable, due to both non-determinism of the decoding process as well as to prompt brittleness. While the intrinsic non-determinism of LLM generation may mimic existing uncertainty in human annotations through distributional shifts in outputs, it is largely assumed, yet unexplored, that the prompt brittleness effect is unique to LLMs. This raises the question: do human annotators show similar sensitivity to instruction changes? If so, should prompt brittleness in LLMs be considered problematic? One may alternatively hypothesize that prompt brittleness correctly reflects human annotation variances. To fill this research gap, we systematically compare the effects of prompt modifications on LLMs and identical instruction modifications for human annotators, focusing on the question of whether humans are similarly sensitive to prompt perturbations. To study this, we prompt both humans and LLMs for a set of text classification tasks conditioned on prompt variations. Our findings indicate that both humans and LLMs exhibit increased brittleness in response to specific types of prompt modifications, particularly those involving the substitution of alternative label sets or label formats. However, the distribution of human judgments is less affected by typographical errors and reversed label order than that of LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)の出力は、デコードプロセスの非決定性と脆さの促進により不安定である。 LLM生成の本質的な非決定論は、出力の分布シフトによる人間のアノテーションの既存の不確かさを模倣する可能性があるが、迅速な脆性効果はLLMに固有のものであると広く推測されている。これは、人間のアノテータが命令の変更に対して同様の感度を示すかという疑問を提起する。もしそうなら、LSMの脆性は問題視されるべきだろうか? むしろ、脆さが人間のアノテーションのばらつきを正しく反映していると仮説を立てることもある。本研究のギャップを埋めるために,人間のアノテータに対するLLMに対する即時修正と同一の指示修正の効果を体系的に比較し,ヒトが即時摂動に対して同様に敏感であるかどうかを問う。そこで本研究では,人間とLLMの両方に,迅速な変化を条件としたテキスト分類タスクのセットを指示する。以上の結果から,人間とLDMの両者は,特に代替ラベルやラベルフォーマットの代替など,特定の種類の即時修正に応答して脆度が増大していることが示唆された。しかし, 人的判断の分布は, LLMよりもタイポグラフィ上の誤りやラベル順の逆の影響を受けない。

論文の概要: Are Humans as Brittle as Large Language Models?

関連論文リスト