Fugu-MT 論文翻訳(概要): Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification

論文の概要: Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification

arxiv url: http://arxiv.org/abs/2602.04297v1
Date: Wed, 04 Feb 2026 07:59:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-05 19:45:11.429263
Title: Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification
Title（参考訳）: テキスト分類のための大規模言語モデルにおけるプロンプト感性の再検討:プロンプト不特定性の役割
Authors: Branislav Pecher, Michal Spiegel, Robert Belanec, Jan Cegin,
Abstract要約: 大型言語モデル (LLM) はゼロショットと少数ショットの分類器として広く使われている。特定されていないプロンプトと特定の指示を提供するプロンプトの感度を研究・比較する。命令プロンプトがそのような問題に苦しむのに対して、未特定プロンプトは、関連するトークンに対して高い性能のばらつきとロジット値の低下を示す。
参考スコア（独自算出の注目度）: 3.2059646106414967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are widely used as zero-shot and few-shot classifiers, where task behaviour is largely controlled through prompting. A growing number of works have observed that LLMs are sensitive to prompt variations, with small changes leading to large changes in performance. However, in many cases, the investigation of sensitivity is performed using underspecified prompts that provide minimal task instructions and weakly constrain the model's output space. In this work, we argue that a significant portion of the observed prompt sensitivity can be attributed to prompt underspecification. We systematically study and compare the sensitivity of underspecified prompts and prompts that provide specific instructions. Utilising performance analysis, logit analysis, and linear probing, we find that underspecified prompts exhibit higher performance variance and lower logit values for relevant tokens, while instruction-prompts suffer less from such problems. However, linear probing analysis suggests that the effects of prompt underspecification have only a marginal impact on the internal LLM representations, instead emerging in the final layers. Overall, our findings highlight the need for more rigour when investigating and mitigating prompt sensitivity.
Abstract（参考訳）: 大規模言語モデル(LLM)はゼロショットと少数ショットの分類器として広く使われており、タスクの振る舞いはプロンプトによって制御される。多くの研究で、LCMは変化の速さに敏感であり、小さな変化が性能に大きな変化をもたらすことが報告されている。しかし、多くの場合、最小限のタスク命令を提供し、モデルの出力空間を弱く制約する不特定なプロンプトを用いて感度の調査を行う。本研究は、観察された刺激感受性のかなりの部分は、過小評価によるものであると論じる。特定されていないプロンプトと特定の指示を提供するプロンプトの感度を体系的に研究し比較する。性能解析,ロジット解析,線形探索を用いて,不特定プロンプトは関連するトークンに対して高い性能バラツキと低いロジット値を示すが,命令プロンプトはそのような問題に悩まされる。しかし、線形探索解析により、プロンプト不特定性の影響は内部のLDM表現に限界的な影響しか与えず、最終層に現れることが示唆された。以上の結果から,急激な敏感度を調査・緩和する際の厳密さの必要性が浮き彫りになった。

論文の概要: Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification

関連論文リスト