Fugu-MT 論文翻訳(概要): DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

論文の概要: DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

arxiv url: http://arxiv.org/abs/2509.04970v1
Date: Fri, 05 Sep 2025 09:52:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-08 14:27:25.556438
Title: DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation
Title（参考訳）: DeGuV: 操作における一般化と解釈性のための奥行き誘導型視覚強化学習
Authors: Tien Pham, Xinyun Chi, Khang Nguyen, Manfred Huber, Angelo Cangelosi,
Abstract要約: 本稿では,一般化とサンプル効率を両立させるRLフレームワークであるDeGuVを紹介する。我々は、奥行き入力からマスクを生成する学習可能なマスカネットワークを活用し、重要な視覚情報のみを保存し、無関係なピクセルを破棄する。さらに,コントラッシブ・ラーニングを取り入れ,改良後のQ値推定を安定化させ,サンプル効率とトレーニング安定性をさらに向上させる。
参考スコア（独自算出の注目度）: 3.694734526301468
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) agents can learn to solve complex tasks from visual inputs, but generalizing these learned skills to new environments remains a major challenge in RL application, especially robotics. While data augmentation can improve generalization, it often compromises sample efficiency and training stability. This paper introduces DeGuV, an RL framework that enhances both generalization and sample efficiency. In specific, we leverage a learnable masker network that produces a mask from the depth input, preserving only critical visual information while discarding irrelevant pixels. Through this, we ensure that our RL agents focus on essential features, improving robustness under data augmentation. In addition, we incorporate contrastive learning and stabilize Q-value estimation under augmentation to further enhance sample efficiency and training stability. We evaluate our proposed method on the RL-ViGen benchmark using the Franka Emika robot and demonstrate its effectiveness in zero-shot sim-to-real transfer. Our results show that DeGuV outperforms state-of-the-art methods in both generalization and sample efficiency while also improving interpretability by highlighting the most relevant regions in the visual input
Abstract（参考訳）: 強化学習(RL)エージェントは視覚的な入力から複雑なタスクを学習することができるが、これらの学習スキルを新しい環境に一般化することは、RLアプリケーション、特にロボット工学において大きな課題である。データ拡張は一般化を改善することができるが、しばしばサンプル効率とトレーニングの安定性を損なう。本稿では,一般化とサンプル効率を両立させるRLフレームワークであるDeGuVを紹介する。具体的には、奥行き入力からマスクを生成する学習可能なマスカネットワークを活用し、重要な視覚情報のみを保存しつつ、無関係なピクセルを破棄する。これにより、RLエージェントが本質的な機能に集中し、データ拡張時の堅牢性を向上させることが保証される。さらに,コントラッシブ・ラーニングを取り入れ,改良後のQ値推定を安定化させ,サンプル効率とトレーニング安定性をさらに向上させる。提案手法をFranka Emikaロボットを用いてRL-ViGenベンチマークで評価し,その実写におけるゼロショット・シミュレートの有効性を実証した。以上の結果から,DeGuVは,視覚入力の最も関連性の高い領域を強調表示することにより,解釈性の向上とともに,一般化とサンプル効率の両面で最先端の手法より優れていることが示された。

論文の概要: DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

関連論文リスト