Fugu-MT 論文翻訳(概要): Task Robustness via Re-Labelling Vision-Action Robot Data

論文の概要: Task Robustness via Re-Labelling Vision-Action Robot Data

arxiv url: http://arxiv.org/abs/2606.10918v1
Date: Tue, 09 Jun 2026 14:28:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:58.551084
Title: Task Robustness via Re-Labelling Vision-Action Robot Data
Title（参考訳）: 視覚行動ロボットデータの再ラベル化によるタスクロバストネス
Authors: Artur Kuramshin, Özgür Aslan, Cyrus Neary, Glen Berseth,
Abstract要約: 本稿では,既存のロボットデータセットを拡張可能なスケーラブルなフレームワークであるRe-Labelling Vision-Action Robot Data (TREAD)によるタスクロバストネスについて紹介する。提案手法では,従来の命令ラベルと初期シーンからセマンティックなサブタスクを生成し,これらのサブタスクに条件付けされたデモビデオを分割し,オブジェクトプロパティを組み込んだ多様なインストラクションを生成する。以上の結果から,TREADは軌道分解による計画一般化と言語多様性の向上による言語条件付き政策一般化の両立を図っている。
参考スコア（独自算出の注目度）: 15.985610886484226
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and generalize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces Task Robustness via Re-Labelling Vision-Action Robot Data (TREAD), a scalable framework that leverages large Vision-Language Models (VLMs) to augment existing robotics datasets without additional data collection, harnessing the transferable knowledge embedded in these models. Our approach leverages a pretrained VLM through three stages: generating semantic sub-tasks from original instruction labels and initial scenes, segmenting demonstration videos conditioned on these sub-tasks, and producing diverse instructions that incorporate object properties, effectively decomposing longer demonstrations into grounded language-action pairs. We further enhance robustness by augmenting the data with linguistically diverse versions of the text goals. Evaluations on LIBERO demonstrate that policies trained on our augmented datasets exhibit improved performance on novel, unseen tasks and goals. Our results show that TREAD enhances both planning generalization through trajectory decomposition and language-conditioned policy generalization through increased linguistic diversity.
Abstract（参考訳）: 最近のロボット学習のスケーリングモデルの動向は、様々な操作タスクを実行し、新しいシナリオに一般化できる印象的なポリシーをもたらしている。しかしながら、これらのポリシーは、既存のロボティクスデータセットの言語的および行動的シーケンスの多様性が限られているため、以下の指示に苦慮し続けている。本稿では,大規模ビジョンランゲージモデル(VLM)を活用するスケーラブルなフレームワークであるRe-Labelling Vision-Action Robot Data (TREAD)によるタスクロバストネスについて紹介する。提案手法では,従来の命令ラベルと初期シーンからセマンティックなサブタスクを生成し,これらのサブタスクに条件付けされたデモビデオを分割し,オブジェクト特性を組み込んだ多種多様なインストラクションを生成し,より長いデモをグラウンド化された言語とアクションのペアに効果的に分解する。言語的に多種多様なテキスト目標を用いてデータを増強することにより、ロバスト性をさらに強化する。 LIBEROの評価は、拡張データセットでトレーニングされたポリシーが、新規で目に見えないタスクや目標のパフォーマンスを向上させることを示した。以上の結果から,TREADは軌道分解による計画一般化と言語多様性の向上による言語条件付き政策一般化の両立を図っている。

論文の概要: Task Robustness via Re-Labelling Vision-Action Robot Data

関連論文リスト