Fugu-MT 論文翻訳(概要): Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

論文の概要: Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

arxiv url: http://arxiv.org/abs/2602.18025v1
Date: Fri, 20 Feb 2026 06:39:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-23 18:01:41.251971
Title: Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
Title（参考訳）: 不均一なロボットデータセットのためのクロス・エンボディメントオフライン強化学習
Authors: Haruki Abe, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada,
Abstract要約: オフライン強化学習(オフラインRL)。 16の異なるロボットプラットフォームにまたがる移動データセット群を構築した。実験により、この組み合わせによるアプローチは、最適下方軌道に富んだデータセットによる事前学習に優れ、純粋な行動クローニングよりも優れていることが確認された。本稿では,形態的類似性によってロボットをクラスタ化し,グループ勾配でモデルを更新する,エンボディメントに基づくグループ化戦略を提案する。
参考スコア（独自算出の注目度）: 47.55508376631633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. Offline RL leverages both expert and abundant suboptimal data, and cross-embodiment learning aggregates heterogeneous robot trajectories across diverse morphologies to acquire universal control priors. We perform a systematic analysis of this offline RL and cross-embodiment paradigm, providing a principled understanding of its strengths and limitations. To evaluate this offline RL and cross-embodiment paradigm, we construct a suite of locomotion datasets spanning 16 distinct robot platforms. Our experiments confirm that this combined approach excels at pre-training with datasets rich in suboptimal trajectories, outperforming pure behavior cloning. However, as the proportion of suboptimal data and the number of robot types increase, we observe that conflicting gradients across morphologies begin to impede learning. To mitigate this, we introduce an embodiment-based grouping strategy in which robots are clustered by morphological similarity and the model is updated with a group gradient. This simple, static grouping substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.
Abstract（参考訳）: スケーラブルなロボットポリシー事前学習は、各プラットフォームで高品質なデモを収集するコストが高いために妨げられている。本研究では,オフライン強化学習(オフラインRL)とクロス・エボディメント・ラーニング(クロス・エボディメント・ラーニング)を併用することでこの問題に対処する。オフラインRLは専門家と豊富な最適データの両方を活用し、クロス・エボディメント・ラーニングは多種多様な形態の異種ロボット軌道を集約し、普遍的な制御の優先順位を取得する。我々は、このオフラインRLとクロス・エボディメントのパラダイムを体系的に分析し、その強みと限界を原則的に理解する。このオフラインRLとクロスエボディメントのパラダイムを評価するために,16の異なるロボットプラットフォームにまたがる移動データセット群を構築した。実験により, この組み合わせは, 最適軌道に富んだデータセットによる事前学習に優れ, 純粋な行動クローニングよりも優れていることを確認した。しかし、最適なデータの割合とロボットの種類が増加するにつれて、形態学にまたがる対立する勾配が学習を妨げることが観察される。これを軽減するために,形態的類似性によってロボットをクラスタ化し,群勾配でモデルを更新する,エンボディメントに基づくグループ化戦略を導入する。この単純で静的なグルーピングは、ロボット間の衝突を大幅に減らし、既存のコンフリクト解決法より優れている。

論文の概要: Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

関連論文リスト