Fugu-MT 論文翻訳(概要): Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs

論文の概要: Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs

arxiv url: http://arxiv.org/abs/2510.14242v1
Date: Thu, 16 Oct 2025 02:54:01 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.685868
Title: Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs
Title（参考訳）: Flip-Flop Consistency:LLMの急激な摂動に対するロバストネスの教師なしトレーニング
Authors: Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Morteza Dehghani,
Abstract要約: 大規模言語モデル(LLM)は、しばしば同じプロンプトの異なる言い回しに直面したときに矛盾する答えを生成する。 Flip-Flop Consistency(F2C$)を提案する。提案手法は4つのNLPタスクにまたがる11のデータセットに対して,データセット毎に4～15のばらつきが生じる。
参考スコア（独自算出の注目度）: 2.125148574616104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) often produce inconsistent answers when faced with different phrasings of the same prompt. In this paper, we propose Flip-Flop Consistency ($F^2C$), an unsupervised training method that improves robustness to such perturbations. $F^2C$ is composed of two key components. The first, Consensus Cross-Entropy (CCE), uses a majority vote across prompt variations to create a hard pseudo-label. The second is a representation alignment loss that pulls lower-confidence and non-majority predictors toward the consensus established by high-confidence, majority-voting variations. We evaluate our method on 11 datasets spanning four NLP tasks, with 4-15 prompt variations per dataset. On average, $F^2C$ raises observed agreement by 11.62%, improves mean $F_1$ by 8.94%, and reduces performance variance across formats by 3.29%. In out-of-domain evaluations, $F^2C$ generalizes effectively, increasing $\overline{F_1}$ and agreement while decreasing variance across most source-target pairs. Finally, when trained on only a subset of prompt perturbations and evaluated on held-out formats, $F^2C$ consistently improves both performance and agreement while reducing variance. These findings highlight $F^2C$ as an effective unsupervised method for enhancing LLM consistency, performance, and generalization under prompt perturbations. Code is available at https://github.com/ParsaHejabi/Flip-Flop-Consistency-Unsupervised-Training-for-Robustness-to-Prompt- Perturbations-in-LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、しばしば同じプロンプトの異なる言い回しに直面したときに矛盾する答えを生成する。本稿では、このような摂動に対する堅牢性を改善する教師なしのトレーニング手法であるFlip-Flop Consistency(F^2C$)を提案する。 F^2C$は2つのキーコンポーネントから構成される。第1回Consensus Cross-Entropy (CCE)は、急進的な変奏に対して多数決を行い、硬い擬似ラベルを作成する。 2つ目は、高信頼で多数投票のバリエーションによって確立されたコンセンサスに向けて、低信頼と非マジョリティ予測者を引っ張り出す表現アライメント損失である。提案手法は4つのNLPタスクにまたがる11のデータセットに対して,データセット毎に4～15のばらつきが生じる。 F^2C$は平均で11.62%、平均$F_1$を8.94%改善し、フォーマット間のパフォーマンスのばらつきを3.29%低減する。領域外評価では、$F^2C$が効果的に一般化され、$\overline{F_1}$が増加し、多くのソースとターゲットのペア間の分散が減少する。最後に、急激な摂動のサブセットのみをトレーニングし、ホールドアウトフォーマットで評価すると、$F^2C$は分散を減らしながら、パフォーマンスとアグリーメントの両方を一貫して改善する。これらの結果から, 急激な摂動下でのLCMの整合性, 性能, 一般化を向上するための効果的な教師なし手法として, $F^2C$ が注目された。コードはhttps://github.com/ParsaHejabi/Flip-Flop-Consistency-Unsupervised-Training-for-Robustness-to-Prompt- Perturbations-in-LLMsで公開されている。

論文の概要: Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs

関連論文リスト