Fugu-MT 論文翻訳(概要): Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

論文の概要: Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

arxiv url: http://arxiv.org/abs/2605.09638v1
Date: Sun, 10 May 2026 16:34:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.345936
Title: Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning
Title（参考訳）: Plan2Cleanse: 深層強化学習におけるモンテカルロ計画によるテスト時間バックドアディフェンス
Authors: Sze-Ann Chen, Zhi-Yi Chin, Kui-Yuan Chen, Chi-Yu Li, Ping-Chun Hsieh,
Abstract要約: Plan2Cleanseはテスト時の検出と緩和のためのフレームワークだ。モンテカルロ木探索に適応し、RLバックドア攻撃を効果的に識別し、中和する。 Plan2Cleanseは、トリガー検出成功率を大幅に改善する。
参考スコア（独自算出の注目度）: 12.26506262764069
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensuring the security of reinforcement learning (RL) models is critical, particularly when they are trained by third parties and deployed in real-world systems. Attackers can implant backdoors into these models, causing them to behave normally under typical conditions, but execute malicious behaviors when specific triggers are activated. In this work, we propose Plan2Cleanse, a test-time detection and mitigation framework that adapts Monte Carlo Tree Search to efficiently identify and neutralize RL backdoor attacks without requiring model retraining. Our approach recasts backdoor detection as a planning problem, enabling systematic exploration of temporally extended trigger sequences while maintaining black-box access to the target policy. By leveraging the detection results, Plan2Cleanse can further achieve efficient mitigation through tree-search preventive replanning. We evaluated our method in competitive MuJoCo environments, simulated O-RAN wireless networks, and Atari games. Plan2Cleanse achieves substantial improvements, increasing trigger detection success rates by more than 61.4 percentage points in stealthy O-RAN scenarios and improving win rates from 35\% to 53\% in competitive Humanoid environments. These results demonstrate the effectiveness of our test-time defense approach and highlight the importance of proactive defenses against backdoor threats in RL deployments. Our implementation is publicly available at https://github.com/rl-bandits-lab/RL-Backdoor.
Abstract（参考訳）: 強化学習(RL)モデルの安全性の確保は、特に第三者によって訓練され、現実世界のシステムに配備される場合、重要である。攻撃者はこれらのモデルにバックドアを埋め込むことができ、通常の条件下では正常に動作させるが、特定のトリガーがアクティベートされると悪意のある動作を実行する。本研究では,モンテカルロ木探索をモデル再訓練を必要とせずに効率的にRLバックドア攻撃を識別・中和するテスト時間検出・緩和フレームワークであるPlan2Cleanseを提案する。提案手法はバックドア検出を計画上の問題として再キャストし,対象ポリシーへのブラックボックスアクセスを維持しつつ,時間的に拡張されたトリガシーケンスの体系的な探索を可能にする。検出結果を活用することにより、Plan2Cleanseは、ツリー探索防止計画による効率的な緩和を実現することができる。提案手法を競合する MuJoCo 環境,O-RAN 無線ネットワーク,Atari ゲームで評価した。 Plan2Cleanseは大幅に改善され、ステルスなO-RANシナリオでは61.4ポイント以上のトリガー検出成功率の増加と、競合するヒューマノイド環境では35\%から53\%の勝利率向上を実現している。これらの結果から,テストタイム・ディフェンス・アプローチの有効性が示され,RL導入におけるバックドア・脅威に対するプロアクティブ・ディフェンスの重要性が強調された。私たちの実装はhttps://github.com/rl-bandits-lab/RL-Backdoor.comで公開されています。

論文の概要: Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

関連論文リスト