Fugu-MT 論文翻訳(概要): MALLVI: a multi agent framework for integrated generalized robotics manipulation

論文の概要: MALLVI: a multi agent framework for integrated generalized robotics manipulation

arxiv url: http://arxiv.org/abs/2602.16898v1
Date: Wed, 18 Feb 2026 21:28:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-20 15:21:28.415486
Title: MALLVI: a multi agent framework for integrated generalized robotics manipulation
Title（参考訳）: 汎用ロボット操作のためのマルチエージェントフレームワークMALLVI
Authors: Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani, Babak Khalaj,
Abstract要約: クローズドループフィードバック駆動型ロボット操作を実現するフレームワークであるMALLViを提案する。単一のモデルを使用する代わりに、MALLViは特別なエージェントをコーディネートして、知覚、ローカライゼーション、推論、高レベルの計画を管理する。実験により、反復閉ループ多重エージェント調整は一般化を改善し、ゼロショット操作タスクの成功率を増加させることが示された。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings.We present MALLVi, a Multi Agent Large Language and Vision framework that enables closed loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVi generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step.Rather than using a single model, MALLVi coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning.Experiments in simulation and real world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks.Code available at https://github.com/iman1234ahmadi/MALLVI.
Abstract（参考訳）: 大規模言語モデル(LLM)を用いたロボット操作のためのタスクプランニングが新興分野である。従来のアプローチでは, 特殊なモデル, 微調整, 即時チューニングに頼っており, 環境フィードバックの堅牢さを伴わずにオープンループ方式で動作することが多かったため, 動的環境下では脆弱であり, クローズドループフィードバック駆動ロボット操作が可能なマルチエージェント大規模言語・ビジョンフレームワークであるMALLViが提案されている。自然言語と環境の画像が与えられた後、MALLViはロボットマニピュレータに対して実行可能な原子アクションを生成する。行動実行後、視覚言語モデル(VLM)は環境フィードバックを評価し、プロセスを繰り返し、次のステップに進むかを決定するが、MALLViは特定のエージェント、デコンポザ、ローカライザ、シンガー、リフレクタを調整し、知覚、ローカライゼーション、推論、高レベルプランニングを管理する。オプションのDescriptorエージェントは初期状態のビジュアルメモリを提供する。 Reflectorは、関連するエージェントのみをリアクティベートし、完全なリプランを避けることで、ターゲットとなるエラー検出とリカバリをサポートする。シミュレーションと実世界の設定の実験では、反復的なクローズドループのマルチエージェント調整が一般化を改善し、ゼロショット操作タスクの成功率を高めることが示されている。

論文の概要: MALLVI: a multi agent framework for integrated generalized robotics manipulation

関連論文リスト