Fugu-MT 論文翻訳(概要): Harnessing non-adversarial robustness in large language models

論文の概要: Harnessing non-adversarial robustness in large language models

arxiv url: http://arxiv.org/abs/2605.29816v1
Date: Thu, 28 May 2026 12:00:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 05:02:24.581954
Title: Harnessing non-adversarial robustness in large language models
Title（参考訳）: 大規模言語モデルにおける非敵対的ロバスト性の評価
Authors: Qinghua Zhou, Ellina Aleshina, Andrey Lovyagin, Oleg Somov, Mikhail Seleznyov, Alexander Panchenko, Ivan Oseledets, Elena Tutubalina, Ivan Y. Tyukin,
Abstract要約: 我々は、ロバストネスは単純な微調整プロセス、すなわちロバストネスのデバイアスによって達成できることを示す。偏りが役に立たない状態を特定し、理論と広範な実験を通して、偏りは強靭性を高めるための迅速かつ効率的なツールである可能性があることを実証する。
参考スコア（独自算出の注目度）: 53.703320836018605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The work presents an approach for addressing the challenge of robustness in Large Language Models (LLMs) to alterations and potential errors caused by semantically similar but textually different prompts. Recent works have shown that these kinds of prompt variations can significantly impact the performance of LLMs on tasks. The central question is: can LLMs' robustness to semantically-neutral prompt alterations be acquired without expensive retraining of the entire model? We address this question both theoretically and through experiments. Our theoretical analysis reveals a crucial factor impacting model robustness - a systematic expected shift or perturbation-induced bias in neural network module outputs. Motivated by this analysis, we show that robustness can be achieved via a simple fine-tuning process: debiasing for robustness. We identify conditions when debiasing helps and when it does not, and demonstrate, through both theory and extensive experiments, that debiasing for robustness may indeed be a quick and efficient tool to enhance robustness and provide certification against random prompt perturbations.
Abstract（参考訳）: この研究は、意味論的に類似しているが、テキスト的に異なるプロンプトによって引き起こされる変更や潜在的なエラーに対して、Large Language Models (LLMs) における堅牢性の課題に対処するためのアプローチを提示している。近年の研究では、このような急激な変化がLLMのタスク性能に大きな影響を与えることが示されている。中心的な疑問は、LLMの意味論的中立性に対する堅牢性は、モデル全体の高価な再トレーニングなしに獲得できるか? 理論的にも実験を通じてもこの問題に対処する。我々の理論的分析は、ニューラルネットワークモジュールの出力において、体系的な期待シフトや摂動誘発バイアスというモデルロバスト性に影響を及ぼす決定的な要因を明らかにしている。この分析により、ロバストネスは単純な微調整プロセス、すなわちロバストネスのデバイアス化によって達成できることを示す。脱バイアスが役に立たない状況を特定し、理論と広範な実験を通して、頑健性に対する脱バイアスは、ロバスト性を高め、ランダムな急激な摂動に対する認証を提供するための、迅速かつ効率的なツールである可能性があることを実証する。

論文の概要: Harnessing non-adversarial robustness in large language models

関連論文リスト