Fugu-MT 論文翻訳(概要): Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

論文の概要: Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

arxiv url: http://arxiv.org/abs/2510.03567v1
Date: Fri, 03 Oct 2025 23:32:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.119222
Title: Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs
Title（参考訳）: 機械学習はLLM上の制約付き介入を通して対向ロバスト性に遭遇する
Authors: Fatmazohra Rezkellah, Ramzi Dakhmouche,
Abstract要約: 我々は、機密情報の未学習と脱獄攻撃に対する堅牢性に対処する様々な制約付き最適化の定式化について検討する。私たちが提案する最も単純なポイントワイド制約ベースの介入は、計算コストの低減を図りながら、最大最小の介入よりも優れたパフォーマンスをもたらす。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the increasing adoption of Large Language Models (LLMs), more customization is needed to ensure privacy-preserving and safe generation. We address this objective from two critical aspects: unlearning of sensitive information and robustness to jail-breaking attacks. We investigate various constrained optimization formulations that address both aspects in a \emph{unified manner}, by finding the smallest possible interventions on LLM weights that either make a given vocabulary set unreachable or embed the LLM with robustness to tailored attacks by shifting part of the weights to a \emph{safer} region. Beyond unifying two key properties, this approach contrasts with previous work in that it doesn't require an oracle classifier that is typically not available or represents a computational overhead. Surprisingly, we find that the simplest point-wise constraint-based intervention we propose leads to better performance than max-min interventions, while having a lower computational cost. Comparison against state-of-the-art defense methods demonstrates superior performance of the proposed approach.
Abstract（参考訳）: LLM(Large Language Models)の採用の増加に伴い、プライバシ保護と安全な生成を保証するために、より多くのカスタマイズが必要である。この目的には、機密情報の未学習と、脱獄攻撃に対する堅牢性という2つの重要な側面から対処する。与えられた語彙集合を到達不能にするか、その重みの一部を \emph{safer} 領域に移動させることにより、調整された攻撃に頑健な LLM を埋め込むことによって、LLM の重みに対する最小限の介入を見つけることにより、両面に対処する様々な制約付き最適化公式を考察する。このアプローチは2つのキープロパティを統一する以外に、通常は利用できない、あるいは計算オーバーヘッドを表すオラクル分類器を必要としないという点で、以前の作業とは対照的である。意外なことに、我々が提案する最も単純なポイントワイド制約ベースの介入は、計算コストの低減を図りながら、最大最小の介入よりも優れたパフォーマンスをもたらす。最先端の防御手法との比較は,提案手法の優れた性能を示す。

論文の概要: Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

関連論文リスト