Fugu-MT 論文翻訳(概要): Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning

論文の概要: Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning

arxiv url: http://arxiv.org/abs/2510.12939v1
Date: Tue, 14 Oct 2025 19:35:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-16 20:13:28.394687
Title: Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning
Title（参考訳）: Pruning can not Hurt Robustness: Certified Trade-offs in Reinforcement Learning
Authors: James Pedley, Benjamin Etheridge, Stephen J. Roberts, Francesco Quinzan,
Abstract要約: 我々は,国家のマルコフ決定プロセスにおけるプルーニングの下での確証された堅牢性に関する最初の理論的枠組みを開発する。クリーンタスク性能,プルーニングによるパフォーマンス損失,ロバスト性向上を両立させる新しい3段階の後悔分解を導出する。
参考スコア（独自算出の注目度）: 6.883578421923203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) policies deployed in real-world environments must remain reliable under adversarial perturbations. At the same time, modern deep RL agents are heavily over-parameterized, raising costs and fragility concerns. While pruning has been shown to improve robustness in supervised learning, its role in adversarial RL remains poorly understood. We develop the first theoretical framework for certified robustness under pruning in state-adversarial Markov decision processes (SA-MDPs). For Gaussian and categorical policies with Lipschitz networks, we prove that element-wise pruning can only tighten certified robustness bounds; pruning never makes the policy less robust. Building on this, we derive a novel three-term regret decomposition that disentangles clean-task performance, pruning-induced performance loss, and robustness gains, exposing a fundamental performance--robustness frontier. Empirically, we evaluate magnitude and micro-pruning schedules on continuous-control benchmarks with strong policy-aware adversaries. Across tasks, pruning consistently uncovers reproducible ``sweet spots'' at moderate sparsity levels, where robustness improves substantially without harming - and sometimes even enhancing - clean performance. These results position pruning not merely as a compression tool but as a structural intervention for robust RL.
Abstract（参考訳）: 実世界の環境に展開される強化学習(RL)政策は、敵の摂動の下で信頼性を保たなければならない。同時に、現代の深度RLエージェントは過度にパラメータ化され、コストと脆弱性の懸念が高まる。プルーニングは教師あり学習における堅牢性を改善することが示されているが、敵RLにおけるその役割はいまだに理解されていない。本研究では,SA-MDPにおけるプルーニング条件下でのロバスト性評価のための最初の理論的枠組みを開発する。リプシッツネットワークを用いたガウス的およびカテゴリー的ポリシーでは、要素的プルーニングは証明された堅牢性境界を締め付けるだけで、プルーニングはポリシーを堅牢にしない。これに基づいて、クリーンタスク性能、プルーニングによるパフォーマンス損失、ロバスト性ゲインを解消し、基本的なパフォーマンス-ロバスト性フロンティアを露呈する新しい3段階の後悔分解を導出する。提案手法は,政策対応の強いベンチマークにおいて,規模とマイクロプルーニングのスケジュールを実証的に評価する。タスク全体にわたって、pruningは、再現可能な‘sweet spots’を適度な間隔で発見します。これらの結果は, 圧縮工具としてだけではなく, 頑健なRLの構造的介入として位置決めされる。

関連論文リスト

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
MARL(Multi-Agent Reinforcement Learning)はいくつかの領域で有望な結果を示している。 MARLポリシーは、しばしば堅牢性を欠き、環境の小さな変化に敏感である。政策のリプシッツ定数を制御することにより、ロバスト性を得ることができることを示す。政策のリプシッツ連続性を促進する新しい堅牢なMARLフレームワークであるERNIEを提案する。
論文参考訳（メタデータ） (2023-10-16T20:14:06Z)
Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization [17.322284328945194]
強化学習(Reinforcement Learning, RL)は、環境摂動下での一般化と堅牢性を欠いていると認識されている。我々は $textbfU$ncertainty $textbfS$et $textbfR$egularizer (USR) という新しい正規化器を提案する。
論文参考訳（メタデータ） (2022-07-05T12:56:08Z)
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks [49.15885037760725]
本研究は, 中毒発生時におけるオフライン強化学習(RL)の堅牢性を検証することに注力する。本報告では, 許容可能な毒素トラジェクトリの数を認証する最初の認証フレームワークであるCOPAを提案する。提案手法のいくつかは理論的に厳密であり,一部はNP-Complete問題であることを示す。
論文参考訳（メタデータ） (2022-03-16T05:02:47Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
入力のノルム有界対向摂動に対する強化学習の証明可能な堅牢性について検討する。我々は、スムーズなポリシーによって得られる全報酬が、入力の摂動のノルムバウンドな逆数の下で一定の閾値以下に収まらないことを保証した証明書を生成する。
論文参考訳（メタデータ） (2021-06-21T21:42:08Z)
CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing [41.093241772796475]
本稿では, 逆境状態の摂動に対する強化学習(CROP)のためのロバスト政策の認定のための最初の枠組みを提案する。本研究では,国家ごとの行動の堅牢性と累積報酬の低限界の2種類のロバスト性認定基準を提案する。
論文参考訳（メタデータ） (2021-06-17T07:58:32Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
オフライン強化学習は、探索を必要とせずに、事前に収集された静的データセットから効果的なポリシーを学ぶことを約束する。既存のQラーニングとアクター批判に基づくオフポリティクスRLアルゴリズムは、アウト・オブ・ディストリビューション(OOD)アクションや状態からのブートストラップ時に失敗する。我々は,OOD状態-動作ペアを検出し,トレーニング目標への貢献度を下げるアルゴリズムであるUncertainty Weighted Actor-Critic (UWAC)を提案する。
論文参考訳（メタデータ） (2021-05-17T20:16:46Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
近年の研究では、深層強化学習剤は、エージェントの入力に対する小さな逆方向の摂動に弱いことが示されている。敵攻撃に対する堅牢性を向上した強化学習エージェントを訓練するための原則的フレームワークであるRADIAL-RLを提案する。
論文参考訳（メタデータ） (2020-08-05T07:49:42Z)
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [88.94162416324505]
深部強化学習(DRL)エージェントは、自然な測定誤差や対向雑音を含む観測を通して、その状態を観察する。観測は真の状態から逸脱するので、エージェントを誤解させ、準最適行動を起こすことができる。本研究は, 従来の手法を, 対人訓練などの分類タスクの堅牢性向上に応用することは, 多くのRLタスクには有効でないことを示す。
論文参考訳（メタデータ） (2020-03-19T17:59:59Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。