Fugu-MT 論文翻訳(概要): Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models

論文の概要: Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models

arxiv url: http://arxiv.org/abs/2606.21672v1
Date: Fri, 19 Jun 2026 18:23:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 04:05:54.101477
Title: Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models
Title（参考訳）: 地中レイト・アクション・ワールドモデルを用いた不均質なデモからの模擬
Authors: Tianyou Wang, Anson Lei, Joe Watson, Ingmar Posner,
Abstract要約: 模倣学習は視覚的政策を学ぶための強力なパラダイムとして現れてきたが、その一般化と安定性は、必要な実演データの規模と品質によって制限されている。有望な方向性は、アクション空間が異なり、しばしばアクションラベルが全くない、より豊富だが不均一なデータソースを活用することである。異種データソースを組み合わせた既存のコトレーニングアプローチは、手作業によるアライメント技術に依存している。我々は,この原理を,データソース間の共用潜在行動空間を持つ2組の生成モデルであるGLAM(グラウンドト・レイト・アクション・ワールド・モデル)を用いてインスタンス化する。
参考スコア（独自算出の注目度）: 14.165510655766944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation learning has emerged as a powerful paradigm for learning visuomotor policies, but its generalisation and stability are limited by the scale and quality of demonstration data needed. A promising direction is to leverage more abundant but heterogeneous data sources, which differ in action space and often lack action labels altogether. Existing co-training approaches that combine heterogeneous data sources rely on heuristic and hand-engineered alignment techniques. In contrast, we argue that action representations should be grounded in prediction: actions that produce the same effect on the environment should share the same representation, regardless of their sources. To this end, we instantiate this principle by using a grounded latent-action world model (GLAM), a pair of generative models with a shared latent action space across data sources that is grounded by predicting future observations consistently across sources. This latent action space is used to train downstream behavioural cloning (BC) policies which map observations to latent actions and decode them back to robot actions, providing a paradigm for learning from heterogeneous data. Empirically, we demonstrate that GLAM successfully learns an aligned latent action space that facilitates action transfer across data sources with and without action labels. Across five manipulation tasks in simulation and in the real world, GLAM-aligned policies significantly outperform BC baselines and prior latent-action methods, achieving an average of +48% improvement in task success rate with the same data-scarce setting. Videos and code are available at https://viccccciv.github.io/glam/.
Abstract（参考訳）: 模倣学習は視覚的政策を学ぶための強力なパラダイムとして現れてきたが、その一般化と安定性は、必要な実演データの規模と品質によって制限されている。有望な方向性は、アクション空間が異なり、しばしばアクションラベルが全くない、より豊富だが不均一なデータソースを活用することである。異種データソースを組み合わせた既存のコトレーニングアプローチは、ヒューリスティックおよび手動アライメント技術に依存している。対照的に, 行動表現は, 発生源に関係なく, 環境に同じ効果をもたらす行動は, 同一の表現を共有するべきである。この目的のために、我々は、データソース間で共用された潜在行動空間を持つ2組の生成モデルである、接地された潜在行動世界モデル(GLAM)を用いて、この原理をインスタンス化する。この潜伏行動空間は、下流の行動クローニング(BC)ポリシーを訓練するために使用され、観測結果を潜伏行動にマッピングし、それらをロボットアクションに復号し、異種データから学習するためのパラダイムを提供する。実験により, GLAMは, 動作ラベル付きおよび無動作ラベル付きでデータソース間での動作伝達を容易にする, 協調した潜在動作空間を学習できることを実証した。シミュレーションおよび実世界の5つの操作タスクにおいて、GLAMに準拠したポリシーはBCベースラインと先行の潜時動作法を著しく上回り、同じデータスカース設定でタスク成功率を平均+48%向上させる。ビデオとコードはhttps://viccciv.github.io/glam/.comで公開されている。

論文の概要: Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models

関連論文リスト