Fugu-MT 論文翻訳(概要): Mimic Intent, Not Just Trajectories

論文の概要: Mimic Intent, Not Just Trajectories

arxiv url: http://arxiv.org/abs/2602.08602v2
Date: Wed, 18 Mar 2026 08:05:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.287381
Title: Mimic Intent, Not Just Trajectories
Title（参考訳）: Mimic Intent - 単なるトラジェクトリではない
Authors: Renming Huang, Chendong Zeng, Wenjing Tang, Jintian Cai, Cewu Lu, Panpan Cai,
Abstract要約: これは、根底にある意図を理解せずに生の軌道を模倣することに由来すると我々は主張する。エンド-2-end IL: Mimic Intent, not just Trajectories。いくつかの操作ベンチマークと実際のロボットの実験では、最先端の成功率、優れた推論効率、障害に対する堅牢な一般化、効果的なワンショット転送が示されている。
参考スコア（独自算出の注目度）: 39.77112205526461
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While imitation learning (IL) has achieved impressive success in dexterous manipulation through generative modeling and pretraining, state-of-the-art approaches like Vision-Language-Action (VLA) models still struggle with adaptation to environmental changes and skill transfer. We argue this stems from mimicking raw trajectories without understanding the underlying intent. To address this, we propose explicitly disentangling behavior intent from execution details in end-2-end IL: Mimic Intent, Not just Trajectories(MINT). We achieve this via multi-scale frequency-space tokenization, which enforces a spectral decomposition of action chunk representation. We learn action tokens with a multi-scale coarse-to-fine structure, and force the coarsest token to capture low-frequency global structure and finer tokens to encode high-frequency details. This yields an abstract Intent token that facilitates planning and transfer, and multi-scale Execution tokens that enable precise adaptation to environmental dynamics. Building on this hierarchy, our policy generates trajectories through next-scale autoregression, performing progressive intent-to-execution reasoning, thus boosting learning efficiency and generalization. Crucially, this disentanglement enables one-shot transfer of skills, by simply injecting the Intent token from a demonstration into the autoregressive generation process. Experiments on several manipulation benchmarks and on a real robot demonstrate state-of-the-art success rates, superior inference efficiency, robust generalization against disturbances, and effective one-shot transfer.
Abstract（参考訳）: 模倣学習(IL)は、生成モデリングと事前訓練を通じて、巧妙な操作において驚くべき成功を収めてきたが、ビジョン・ランゲージ・アクション(VLA)モデルのような最先端のアプローチは、まだ環境の変化や技術移転への適応に苦慮している。これは、根底にある意図を理解せずに生の軌道を模倣することに由来すると我々は主張する。これを解決するために、エンド-2-end IL: Mimic Intent, Not just Trajectories(MINT)における実行の詳細から行動意図を明示的に切り離すことを提案する。我々は、アクションチャンク表現のスペクトル分解を強制するマルチスケールの周波数空間トークン化によってこれを実現する。マルチスケールの粗い構造でアクショントークンを学習し、粗いトークンに低周波のグローバル構造と細かなトークンを捕捉させ、高周波の詳細を符号化する。これにより、計画と転送を容易にする抽象的なIntentトークンと、環境力学への正確な適応を可能にするマルチスケールなExecutionトークンが得られる。この階層を基盤として、我々の政策は、次のスケールの自己回帰を通じて軌道を生成し、プログレッシブな意図と実行の推論を行い、学習効率と一般化を促進する。重要なのは、インテントトークンをデモから自己回帰生成プロセスに単純に注入することで、スキルのワンショット転送を可能にすることだ。いくつかの操作ベンチマークと実際のロボットの実験では、最先端の成功率、優れた推論効率、障害に対する堅牢な一般化、効果的なワンショット転送が示されている。

論文の概要: Mimic Intent, Not Just Trajectories

関連論文リスト