Fugu-MT 論文翻訳(概要): TGM-VLA: Task-Guided Mixup for Sampling-Efficient and Robust Robotic Manipulation

論文の概要: TGM-VLA: Task-Guided Mixup for Sampling-Efficient and Robust Robotic Manipulation

arxiv url: http://arxiv.org/abs/2603.00615v1
Date: Sat, 28 Feb 2026 12:16:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.783541
Title: TGM-VLA: Task-Guided Mixup for Sampling-Efficient and Robust Robotic Manipulation
Title（参考訳）: TGM-VLA: サンプリング効率・ロバストロボットマニピュレーションのためのタスクガイド混合
Authors: Fanqi Pu, Lei Jiang, Wenming Yang,
Abstract要約: 本稿では,モデル性能とトレーニング効率の両方を大幅に改善する,新しい包括的枠組みを提案する。まず,サンプリング戦略の再設計と最適化を行い,メモリ消費を80%削減し,トレーニング速度を5倍に向上させた。第二に,暗黒物体のあいまいさを解消する単純で効果的なモジュールであるカラー反転投影分岐を用いてモデルを強化する。
参考スコア（独自算出の注目度）: 42.52624620346963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of robotic imitation learning is fundamentally limited by data quality and training strategies. Prevalent sampling strategies on RLBench suffer from severe keyframe redundancy and imbalanced temporal distribution, leading to inefficient memory usage and unstable optimization. Moreover, reprojecting point clouds onto multi-view images with a black background--while more efficient than voxel-based methods--often causes dark objects to be indistinguishable and hard to manipulate. In this work, we propose a novel holistic framework that significantly improves both model performance and training efficiency. First, we redesign and optimize the keyframe sampling strategy, reducing memory consumption by 80% and accelerating training speed by 5x. Second, we augment the model with a color inversion projection branch--a simple yet effective module that resolves the ambiguity of dark objects. Finally, we propose a task-guided mixup technique that dynamically fuses point clouds and action heatmaps according to task instructions, greatly improving robustness to distractors and performance in multi-goal scenarios. Extensive experiments demonstrate that our method achieves state-of-the-art performance with a 90.5% success rate on RLBench and 68.8% on the COLOSSEUM benchmark under challenging interference conditions. Our code and checkpoints are available at https://github.com/PuFanqi23/TGM-VLA.
Abstract（参考訳）: ロボット模倣学習の性能は、基本的にデータ品質とトレーニング戦略によって制限される。 RLBenchのサンプリング戦略は、キーフレームの冗長性と時間分布の不均衡に悩まされ、非効率なメモリ使用率と不安定な最適化をもたらす。さらに,黒背景のマルチビュー画像上に点雲を投影することは,しばしばボクセルベースの手法よりも効率的であり,暗黒物体の識別が困難である。本研究では,モデル性能とトレーニング効率を両立させ,キーフレームサンプリング戦略を設計・最適化し,メモリ消費を80%削減し,トレーニング速度を5倍に向上させる。第2に,暗黒物体のあいまいさを解消する,シンプルで効果的なモジュールであるカラー反転プロジェクションブランチでモデルを拡張する。最後に,タスク指示に従ってポイントクラウドとアクションヒートマップを動的に融合させるタスク誘導混合手法を提案する。 COLOSSEUMベンチマークでは90.5%がRLBenchで,68.8%がCOLOSSEUMで達成された。私たちのコードとチェックポイントはhttps://github.com/PuFanqi23/TGM-VLAで公開されています。

論文の概要: TGM-VLA: Task-Guided Mixup for Sampling-Efficient and Robust Robotic Manipulation

関連論文リスト