Fugu-MT 論文翻訳(概要): Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem

論文の概要: Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem

arxiv url: http://arxiv.org/abs/2510.08768v1
Date: Thu, 09 Oct 2025 19:36:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 00:38:47.606924
Title: Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem
Title（参考訳）: バッキンガムのPi理論を用いた強化学習におけるゼロショット政策伝達
Authors: Francisco Pascoa, Ian Lalonde, Alexandre Girard,
Abstract要約: 強化学習ポリシーは、しばしば、新しいロボット、タスク、または異なる物理的パラメータを持つ環境への一般化に失敗する。本稿では,バッキンガムのPi理論に基づくシンプルなゼロショット転送手法を提案する。
参考スコア（独自算出の注目度）: 42.37643072381109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) policies often fail to generalize to new robots, tasks, or environments with different physical parameters, a challenge that limits their real-world applicability. This paper presents a simple, zero-shot transfer method based on Buckingham's Pi Theorem to address this limitation. The method adapts a pre-trained policy to new system contexts by scaling its inputs (observations) and outputs (actions) through a dimensionless space, requiring no retraining. The approach is evaluated against a naive transfer baseline across three environments of increasing complexity: a simulated pendulum, a physical pendulum for sim-to-real validation, and the high-dimensional HalfCheetah. Results demonstrate that the scaled transfer exhibits no loss of performance on dynamically similar contexts. Furthermore, on non-similar contexts, the scaled policy consistently outperforms the naive transfer, significantly expanding the volume of contexts where the original policy remains effective. These findings demonstrate that dimensional analysis provides a powerful and practical tool to enhance the robustness and generalization of RL policies.
Abstract（参考訳）: 強化学習(RL)ポリシーは、しばしば新しいロボット、タスク、または異なる物理パラメータを持つ環境への一般化に失敗する。本稿では,バッキンガムのPi理論に基づくシンプルなゼロショット転送手法を提案する。この方法は、事前訓練されたポリシーを新しいシステムコンテキストに適用し、入力(観測)をスケーリングし、非次元空間を通して(動作)を出力し、再トレーニングを必要としない。この手法は,シミュレーション振り子,シミュレート・トゥ・リアル検証のための物理振り子,高次元HalfCheetahの3つの環境にまたがるナイーブ転送ベースラインに対して評価される。その結果, 動的に類似した状況下では, スケールドトランスファーは性能の低下を示さないことがわかった。さらに、非類似の文脈では、スケールされたポリシーはナイーブ・トランスファーを一貫して上回り、元のポリシーが有効であり続けるコンテキストのボリュームを著しく拡大する。これらの結果は,RLポリシーの堅牢性と一般化を高めるために,次元解析が強力かつ実用的なツールを提供することを示している。

論文の概要: Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem

関連論文リスト