Fugu-MT 論文翻訳(概要): Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

論文の概要: Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

arxiv url: http://arxiv.org/abs/2603.05185v1
Date: Thu, 05 Mar 2026 13:55:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.938439
Title: Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation
Title（参考訳）: ループにおける批判:ロバストな長距離操作のためのトリシステムVLAフレームワーク
Authors: Pengfei Yi, Yingjie Ma, Wenjiang Xu, Yanan Hao, Shuai Gan, Wanting Li, Shanlin Zhong,
Abstract要約: Critic in the Loopは動的VLM-Expertスケジューリングによって駆動される適応的階層型フレームワークである。中心となるのは、グローバル推論のためのVLM脳、リアクティブ実行のためのVLA小脳、軽量な視覚的批判を含む、バイオニックなTri-Systemアーキテクチャである。我々のアーキテクチャは、人間にインスパイアされたルールをシームレスに統合し、無限の再試行ループを直感的に破る。
参考スコア（独自算出の注目度）: 5.339854280045898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Balancing high-level semantic reasoning with low-level reactive control remains a core challenge in visual robotic manipulation. While Vision-Language Models (VLMs) excel at cognitive planning, their inference latency precludes real-time execution. Conversely, fast Vision-Language-Action (VLA) models often lack the semantic depth required for complex, long-horizon tasks. To bridge this gap, we introduce Critic in the Loop, an adaptive hierarchical framework driven by dynamic VLM-Expert scheduling. At its core is a bionic Tri-System architecture comprising a VLM brain for global reasoning, a VLA cerebellum for reactive execution, and a lightweight visual Critic. By continuously monitoring the workspace, the Critic dynamically routes control authority. It sustains rapid closed-loop execution via the VLA for routine subtasks, and adaptively triggers the VLM for replanning upon detecting execution anomalies such as task stagnation or failures. Furthermore, our architecture seamlessly integrates human-inspired rules to intuitively break infinite retry loops. This visually-grounded scheduling minimizes expensive VLM queries, while substantially enhancing system robustness and autonomy in out-of-distribution (OOD) scenarios. Comprehensive experiments on challenging, long-horizon manipulation benchmarks reveal that our approach achieves state-of-the-art performance.
Abstract（参考訳）: ハイレベルなセマンティック推論と低レベルなリアクティブ制御のバランスをとることは、視覚ロボット操作において依然として重要な課題である。 VLM(Vision-Language Models)は認知計画において優れているが、その推論遅延はリアルタイム実行を妨げている。逆に、高速ビジョン・ランゲージ・アクション(VLA)モデルは、複雑な長距離タスクに必要な意味的な深さを欠いていることが多い。このギャップを埋めるために、動的VLM-Expertスケジューリングによって駆動される適応的階層型フレームワークであるCrytic in the Loopを導入する。中心となるのは、グローバル推論のためのVLM脳、リアクティブ実行のためのVLA小脳、軽量な視覚的批判を含む、バイオニックなTri-Systemアーキテクチャである。ワークスペースを継続的に監視することで、Criticは制御権限を動的にルーティングする。定期的なサブタスクのためにVLAを介して高速なクローズドループ実行を継続し、タスクの停止や障害などの実行異常を検出すると、VLMを適応的にリプランする。さらに、アーキテクチャは人間にインスパイアされたルールをシームレスに統合し、無限の再試行ループを直感的に破壊する。この視覚的なスケジューリングは高価なVLMクエリを最小限に抑えつつ、アウト・オブ・ディストリビューション(OOD)シナリオにおけるシステムの堅牢性と自律性を著しく向上させる。本手法が最先端の性能を実現することを明らかにするため, 長期的評価ベンチマークの総合的な実験を行った。

論文の概要: Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

関連論文リスト