Fugu-MT 論文翻訳(概要): Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

論文の概要: Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

arxiv url: http://arxiv.org/abs/2605.02528v1
Date: Mon, 04 May 2026 12:28:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.282789
Title: Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators
Title（参考訳）: 特殊化を超えて:プロシージャマップジェネレータによるロバスト強化学習ナビゲーション
Authors: Christian Jestel, Nicolas Bach, Marvin Wiedemann, Jan Finke, Peter Detzner,
Abstract要約: ナビゲーション性を保証する4つのジェネレータを,LiDARナビゲーションのトレーニング効率を重視した2次元シミュレータであるMuRoSimに統合する。 5つのナビゲーションポリシーを3つのトレーニングシードで1ジェネレータあたり1000個のシードマップで横断的に評価する。厳密なレイアウトで訓練された専門家は迷路で3.3%成功し、ジェネレータの組み合わせで訓練されたポリシーは91.5+/- 1.1%成功を意味している。
参考スコア（独自算出の注目度）: 1.8454901862917816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (DRL) navigation policies often overfit to the structure of their training environments, as environmental diversity is typically constrained by the manual effort required to design diverse scenarios. While procedural map generation offers scalable diversity, no prior work systematically compares how different generator types affect policy generalization. We integrate four generators (sparse, maze, graph, and Wave Function Collapse) with guaranteed navigability into MuRoSim, a 2D simulator focusing on training efficiency for LiDAR-based navigation. We cross-evaluate five navigation policies on 1000 seeded maps per generator across three training seeds. Results show a strongly asymmetric cross-generator transfer: a specialist trained on sparse layouts falls to 3.3% success on mazes, whereas a policy trained on the combined generator set achieves 91.5 +/- 1.1% mean success. We further demonstrate that A* path-planner subgoal inputs are the dominant factor for robustness, raising success from the 90.2 +/- 1.4% feedforward baseline to 98.9 +/- 0.4% and outperforming GRU recurrence, which only improves the reactive baseline. The DRL policies outperform a classical Carrot+A* controller, which matches their success only at low speeds (1.0 m/s) but collapses to 24.9% at 2.0 m/s. This highlights learned speed adaptation as the decisive advantage of the learned approach. Real-world experiments on a RoboMaster confirm sim-to-real transfer in a cluttered arena, while a maze-like layout exposes remaining failure modes that recurrence helps mitigate.
Abstract（参考訳）: 深層強化学習(DRL)ナビゲーションポリシーは、様々なシナリオを設計するために必要な手作業によって、環境の多様性が制限されるため、トレーニング環境の構造に過度に適合することが多い。プロシージャマップ生成はスケーラブルな多様性を提供するが、以前の作業では、異なるジェネレータタイプがポリシーの一般化にどのように影響するかを体系的に比較することはなかった。我々は,LiDARに基づくナビゲーションのトレーニング効率を重視した2DシミュレータであるMuRoSimに,ナビゲーション性を保証する4つのジェネレータ(スパース,迷路,グラフ,ウェーブファンクション・コラプス)を統合する。 5つのナビゲーションポリシーを3つのトレーニングシードで1ジェネレータあたり1000個のシードマップで横断的に評価する。厳密なレイアウトで訓練された専門家は迷路で3.3%成功し、ジェネレータの組み合わせで訓練されたポリシーは91.5+/- 1.1%成功を意味している。さらに、A*パスプランナーのサブゴール入力がロバスト性の主要な要因であることを示し、90.2+/- 1.4%のフィードフォワードベースラインから98.9+/- 0.4%のフィードフォワードベースラインに成功し、GRUリカレンスを上回り、反応性ベースラインのみを改善する。 DRLのポリシーは古典的なCarrot+A*コントローラよりも優れており、その成功は1.0 m/sでしか一致しないが、2.0 m/sで24.9%に低下する。このことは、学習したアプローチの決定的な利点として、学習した速度適応を強調している。 RoboMaster上の実世界の実験は、散らかったアリーナでのsim-to-real転送を確認し、迷路のようなレイアウトは、再発が緩和する残りの障害モードを公開する。

論文の概要: Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

関連論文リスト