Fugu-MT 論文翻訳(概要): A Multimodal Framework for Human-Multi-Agent Interaction

論文の概要: A Multimodal Framework for Human-Multi-Agent Interaction

arxiv url: http://arxiv.org/abs/2603.23271v1
Date: Tue, 24 Mar 2026 14:35:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.536316
Title: A Multimodal Framework for Human-Multi-Agent Interaction
Title（参考訳）: ヒューマン・マルチエージェントインタラクションのためのマルチモーダルフレームワーク
Authors: Shaid Hasan, Breenice Lee, Sujan Sarker, Tariq Iqbal,
Abstract要約: 本稿では,ロボットが自律認知エージェントとして機能するマルチエージェントインタラクションのためのマルチモーダルフレームワークを提案する。チームレベルでは、集中的な調整機構がターンテイクとエージェントの参加を規制し、重複するスピーチを防ぐ。今後の研究は、大規模ユーザ研究と、社会的基盤を持つマルチエージェントインタラクションダイナミクスのより深い探索に焦点を当てる予定である。
参考スコア（独自算出の注目度）: 5.410329948686681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-robot interaction is increasingly moving toward multi-robot, socially grounded environments. Existing systems struggle to integrate multimodal perception, embodied expression, and coordinated decision-making in a unified framework. This limits natural and scalable interaction in shared physical spaces. We address this gap by introducing a multimodal framework for human-multi-agent interaction in which each robot operates as an autonomous cognitive agent with integrated multimodal perception and Large Language Model (LLM)-driven planning grounded in embodiment. At the team level, a centralized coordination mechanism regulates turn-taking and agent participation to prevent overlapping speech and conflicting actions. Implemented on two humanoid robots, our framework enables coherent multi-agent interaction through interaction policies that combine speech, gesture, gaze, and locomotion. Representative interaction runs demonstrate coordinated multimodal reasoning across agents and grounded embodied responses. Future work will focus on larger-scale user studies and deeper exploration of socially grounded multi-agent interaction dynamics.
Abstract（参考訳）: 人間とロボットのインタラクションは、ますます、マルチロボット、社会的に接地された環境へと向かっている。既存のシステムは、マルチモーダルな認識、具体的表現、および統合されたフレームワークにおける協調的な意思決定を統合するのに苦労している。これにより、共有物理空間における自然かつスケーラブルな相互作用が制限される。このギャップを解消するために、ロボットが統合されたマルチモーダル認知エージェントとして機能するマルチモーダル・エージェント・インタラクションのためのマルチモーダル・フレームワークと、具体化されたLLM(Large Language Model)駆動計画を導入する。チームレベルでは、集中的な調整機構がターンテイクとエージェントの参加を規制し、重複するスピーチや矛盾するアクションを防ぐ。 2つのヒューマノイドロボットに実装された本フレームワークは,音声,ジェスチャー,視線,移動を組み合わせたインタラクションポリシを通じて,コヒーレントなマルチエージェントインタラクションを実現する。代表的相互作用はエージェント間の協調的マルチモーダル推論と接地された具体的応答を示す。今後の研究は、大規模ユーザ研究と、社会的基盤を持つマルチエージェントインタラクションダイナミクスのより深い探索に焦点を当てる予定である。

論文の概要: A Multimodal Framework for Human-Multi-Agent Interaction

関連論文リスト