Fugu-MT 論文翻訳(概要): Training and Evaluating Diffusion Policies with Long Context Lengths

論文の概要: Training and Evaluating Diffusion Policies with Long Context Lengths

arxiv url: http://arxiv.org/abs/2606.16447v1
Date: Mon, 15 Jun 2026 09:19:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.266373
Title: Training and Evaluating Diffusion Policies with Long Context Lengths
Title（参考訳）: 長い文脈長による拡散政策の訓練と評価
Authors: Abhinav Agarwal, Adam Wei, Taylan Kargin, Michael Zeng, Cole Becker, Arif Kerem Dayi, Pablo Parrilo, Asuman Ozdaglar, Russ Tedrake,
Abstract要約: コンテクスト長が短いものから長いものへと漸進的に増加するにつれて、ポリシーのパフォーマンスを最初にベンチマークする。本稿では,複数のコンテキスト長で協調的にポリシーを訓練するためのトレーニングアルゴリズムを提案する。我々は、従来提案されていた長文模倣学習のソリューションを再評価する。
参考スコア（独自算出の注目度）: 19.39755600541638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning has enabled highly-dexterous robotic manipulation from RGB observations. Policies trained with these methods, however, typically condition robot actions on only a short history of observations. These policies cannot solve tasks that require memory and can get stuck repeatedly executing the same failing motions. In this work, we first benchmark policy performance as context length is incrementally increased from short to long, across a spectrum of tasks with varying local stability and memory requirements, and in multiple data regimes. To our knowledge, this is the first study to investigate context length in imitation learning at this level of detail. Our results challenge prior claims: naively scaling context length is not as brittle as advertised in literature. With an appropriate conditioning method and denoising backbone (UNet+Cross-Attention), single-task policies achieve high success rates on many tasks in the usual data regime even with naive scaling. Next, we propose a training algorithm to jointly train policies at multiple context lengths, further reducing the sample complexity of long-context learning. Finally, we apply our findings to re-evaluate some previously proposed solutions to long-context imitation learning.
Abstract（参考訳）: 模倣学習により、RGB観測から高度なロボット操作が可能になった。しかし、これらの方法で訓練された政策は通常、短い観察履歴のみにロボットの動作を条件付ける。これらのポリシーは、メモリを必要とするタスクを解決できず、同じフェール動作を繰り返し実行してしまう可能性がある。本研究では,コンテクスト長が短いものから長いもの,局所的な安定性とメモリ要件の異なるタスクの範囲にまたがって増加し,複数のデータレシエーションにおいて,ポリシー性能のベンチマークを行う。我々の知る限りでは、このレベルでの模倣学習における文脈長の調査は初めてである。文脈長のスケーリングは、文献で宣伝されているほど不安定ではない。適切な条件付け手法と遅延バックボーン(UNet+Cross-Attention)により、単一タスクポリシーは、単純なスケーリングであっても、通常のデータ構造における多くのタスクにおいて高い成功率を達成する。次に、複数のコンテキスト長でポリシーを共同で訓練する学習アルゴリズムを提案し、長いコンテキスト学習におけるサンプルの複雑さをさらに減らした。最後に,本研究の成果を,従来提案されていた長文模倣学習手法の再評価に応用する。

論文の概要: Training and Evaluating Diffusion Policies with Long Context Lengths

関連論文リスト