Fugu-MT 論文翻訳(概要): CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

論文の概要: CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

arxiv url: http://arxiv.org/abs/2605.02600v1
Date: Mon, 04 May 2026 13:49:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.313163
Title: CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation
Title（参考訳）: Coral: ロボットマニピュレーションのためのコンタクトリッチ適応LDM制御
Authors: Berk Çiçek, Mert K. Er, Özgür S. Öğüz,
Abstract要約: 大規模言語モデル (LLMs) と視覚言語モデル (VLMs) は高レベルの推論と意味理解において顕著な能力を示す。我々は,低レベル制御から高レベル推論を分離することでゼロショット計画を可能にするモジュラーフレームワークであるCoRALを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction estimates, which are then explicitly refined in real time via online system identification, while the LLM iteratively modulates the cost-function structure to correct strategic errors based on interaction feedback. Furthermore, a retrieval-based memory unit allows the system to reuse successful strategies across recurrent tasks. This hierarchical architecture ensures real-time control stability by decoupling high-level semantic reasoning from reactive execution, effectively bridging the gap between slow LLM inference and dynamic contact requirements. We validate CoRAL on both simulation and real-world hardware across challenging and novel tasks, such as flipping objects against walls by leveraging extrinsic contacts. Experiments demonstrate that CoRAL outperforms state-of-the-art VLA and foundation-model-based planner baselines by boosting success rates over 50% on average in unseen contact-rich scenarios, effectively handling sim-to-real gaps through its adaptive physical understanding.
Abstract（参考訳）: 大規模言語モデル (LLMs) と視覚言語モデル (VLMs) は、高レベルの推論と意味理解において顕著な能力を示すが、それらを直接コンタクトリッチな操作に適用することは、明示的な物理的基盤の欠如と適応制御を行うことができないため、依然として課題である。このギャップを埋めるため、我々は低レベル制御から高レベル推論を分離することでゼロショット計画を可能にするモジュラーフレームワークであるCoRAL(Contact-Rich Adaptive LLM-based Control)を提案する。ブラックボックスのポリシーとは異なり、CoRALは直接コントローラとしてではなく、サンプリングベースのモーションプランナー(MPPI)のためにコンテキスト対応の客観的関数を合成するコストデザイナとしてLLMを使用している。視覚データにおける物理パラメータのあいまいさに対処するため、我々はニューロシンボリック適応ループを導入する: VLMは、質量推定や摩擦推定などの環境力学のセマンティックな事前情報を提供し、オンラインシステム識別によってリアルタイムで明確に洗練され、LLMはコスト関数構造を反復的に修正し、相互作用フィードバックに基づいて戦略的エラーを補正する。さらに、検索ベースのメモリユニットにより、リカレントタスク間で成功した戦略を再利用することができる。この階層アーキテクチャは、リアクティブ実行から高レベルのセマンティック推論を分離することで、リアルタイム制御の安定性を確保し、遅いLCM推論と動的接触要求のギャップを効果的に埋める。我々はCoralをシミュレーションと現実世界のハードウェアの両方で検証し、外在的接触を利用してオブジェクトを壁に反転させるといった、挑戦的で斬新なタスクにまたがる検証を行った。実験により、CoRALは最先端のVLAとファンデーションモデルベースのプランナーベースラインを上回り、目に見えないコンタクトリッチなシナリオで平均50%以上の成功率を高め、適応的な物理的理解を通じてシミュレートと現実のギャップを効果的に扱えることを示した。

論文の概要: CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

関連論文リスト