Fugu-MT 論文翻訳(概要): Safe and Efficient In-Context Learning via Risk Control

論文の概要: Safe and Efficient In-Context Learning via Risk Control

arxiv url: http://arxiv.org/abs/2510.02480v1
Date: Thu, 02 Oct 2025 18:36:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.131162
Title: Safe and Efficient In-Context Learning via Risk Control
Title（参考訳）: リスク制御による安全かつ効率的なインコンテキスト学習
Authors: Andrea Wynn, Metod Jazbec, Charith Peris, Rinat Khaziev, Anqi Liu, Daniel Khashabi, Eric Nalisnick,
Abstract要約: 大規模言語モデル(LLM)は、いくつかのコンテキスト内サンプルから新しいタスクを学習する。 LLMは不正または悪意のあるデモの影響を受けやすい。本稿では,有害な実演がモデル性能を低下させる程度を制限するための新しい手法を提案する。
参考スコア（独自算出の注目度）: 34.917821132391374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) demonstrate a remarkable ability to learn new tasks from a few in-context examples. However, this flexibility introduces safety concerns: LLMs can be influenced by incorrect or malicious demonstrations -- for example, if an adversary tampers with or injects harmful examples without a human supervisor noticing. This motivates principled designs in which the system itself includes built-in mechanisms to guard against such attacks. We propose a novel approach to limit the degree to which harmful demonstrations can degrade model performance. First, we define a baseline ``safe'' behavior for the model -- the model's performance given no in-context demonstrations (zero-shot). Next, we apply distribution-free risk control (DFRC) to control the extent to which in-context samples can decay performance below zero-shot. We achieve this by leveraging dynamic early exit prediction, ignoring later attention heads that attend the most to the unsafe inputs. Finally, we propose modifications to DFRC that allow it to both control risk for harmful inputs \textit{and} leverage performance and efficiency gains on helpful inputs. We present both theoretical and empirical results showing that our approach can effectively control risk for harmful in-context demonstrations while simultaneously achieving substantial computational efficiency gains with helpful demonstrations.
Abstract（参考訳）: 大規模言語モデル(LLM)は、いくつかのコンテキスト内サンプルから新しいタスクを学習する驚くべき能力を示している。しかし、この柔軟性は安全上の懸念をもたらす: LLMは不正または悪意のあるデモンストレーションの影響を受け得る。これは、システム自体がそのような攻撃から守るためのビルトイン機構を含む、原則化された設計を動機付けている。本稿では,有害な実演がモデル性能を低下させる程度を制限するための新しい手法を提案する。まず、モデルに対するベースラインの ``safe'' の振る舞いを定義します。次に, 分散自由リスク制御 (DFRC) を用いて, 文脈内サンプルがゼロショット以下の性能を劣化させる程度を制御した。我々は、安全でない入力に最も近づいた後続の注意を無視して、動的な早期出口予測を活用することで、これを実現する。最後に、有害な入力に対するリスクを制御できるようにDFRCの修正を提案する。提案手法は,提案手法により,有害な文脈内デモンストレーションのリスクを効果的に制御できると同時に,有効な実演による計算効率の向上を達成できることを示す。

論文の概要: Safe and Efficient In-Context Learning via Risk Control

関連論文リスト