Fugu-MT 論文翻訳(概要): Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

論文の概要: Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

arxiv url: http://arxiv.org/abs/2605.09595v1
Date: Sun, 10 May 2026 15:16:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.325775
Title: Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
Title（参考訳）: 不均一領域における四足歩行制御のためのニューロモルフィック強化学習
Authors: Zhuangyu Han, Abhronil Sengupta,
Abstract要約: ローカルな学習は、グローバルなバックプロパゲーショングラフを、ローカルなニューラルステートによって駆動される更新に置き換えることができる。本研究は,不均一な四足歩行を実現するための平衡プロパゲーション(EP)に基づく近似ポリシ最適化(PPO)フレームワークを提案する。
参考スコア（独自算出の注目度）: 7.828170373014957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting adaptation to terrain variation, payload changes, actuator wear, and other real-world conditions under onboard power constraints. Local learning provides a potential path toward energy-aware on-robot adaptation by replacing global backpropagation graphs with updates driven by local neural states, making the learning rule more compatible with neuromorphic and in-memory computing substrates. This work proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for uneven-terrain quadruped locomotion. The controller combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, while replacing conventional backpropagation-trained policy and value networks with EP-enabled local learning. To train stochastic continuous-control policies with EP, we derive an EP-compatible PPO output-nudging signal and introduce a two-sided ratio clipping mechanism that stabilizes policy updates during relaxation. Experiments on a 12-DoF A1 quadruped show that the proposed controller achieves stable policy convergence in a two-stage uneven terrain locomotion task. Its locomotion performance is comparable to a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability, while improving GPU memory efficiency by 4.3\(\times\) compared with backpropagation through time (BPTT). These results suggest that local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.
Abstract（参考訳）: 強化学習 (Reinforcement Learning, RL) は複雑な地形上の頑強な四足歩行を可能にするが、ほとんどの学習コントローラは、大規模な並列シミュレーションでバックプロパゲーションによってオフラインで訓練され、固定されたポリシーとして配置され、地形の変化、ペイロードの変化、アクチュエータの摩耗、およびオンボード電力制約下での実際の条件に適応する。局所学習は、グローバルなバックプロパゲーショングラフを局所的なニューラルステートによって駆動される更新に置き換え、ニューロモルフィックおよびインメモリコンピューティング基板との互換性を高めることで、エネルギーを意識したオンボット適応への潜在的経路を提供する。本研究は,不均一な四足歩行を実現するための平衡プロパゲーション(EP)に基づくPPOフレームワークを提案する。このコントローラは, バイオインスパイアされた中央パターン生成(CPG)ポリシーと姿勢調整ポリシを組み合わせ, 従来のバックプロパゲーション訓練ポリシとバリューネットワークをEP対応ローカル学習に置き換える。 EPを用いた確率的連続制御ポリシのトレーニングには,EP互換のPPO出力ニュジング信号が導出され,緩和時のポリシー更新を安定化する2側比クリッピング機構が導入された。 12-DoF A1四足歩行実験により, 提案した制御器は2段不均質な地形移動タスクにおいて安定な政策収束を達成することを示した。ローコモーション性能は、成功率、速度トラッキング、アクチュエータパワー、ボディ安定性のバックプロパゲーショントレーニングされたPPOベースラインに匹敵する一方で、時間によるバックプロパゲーション(BPTT)と比較してGPUメモリ効率を4.3\(\times\)改善している。これらの結果から, 局所平衡学習は高次元エンボディロコモーションをサポートし, 低消費電力オンロボット適応と微調整のためのアルゴリズム基盤を提供する可能性が示唆された。

論文の概要: Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

関連論文リスト