Fugu-MT 論文翻訳(概要): Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling

論文の概要: Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling

arxiv url: http://arxiv.org/abs/2605.05711v1
Date: Thu, 07 May 2026 05:55:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.547164
Title: Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling
Title（参考訳）: ループの閉鎖:LLM-RL結合による統一3次元シーン生成と没入的相互作用
Authors: Anh H. Vo, Sungyo Lee, Phil-Joong Kim, Soo-Mi Choi, Yong-Guk Kim,
Abstract要約: 本稿では,言語駆動型3Dシーン生成と没入型ユーザインタラクションのループを閉じる統一フレームワークを提案する。生成とインタラクションを緊密に結合することにより、提案フレームワークはより応答性が高く、適応性があり、リアルなマルチメディア体験を可能にする。
参考スコア（独自算出の注目度）: 1.2722697496405462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have significantly improved language-driven 3D content generation, but most existing approaches still treat scene generation and user interaction as separate processes, limiting the adaptability and immersive potential of interactive multimedia systems. This paper presents a unified framework that closes the loop between language-driven 3D scene generation and immersive user interaction. Given natural language instructions, the system first constructs structured scene representations using LLMs, and then optimizes spatial layouts via reinforcement learning under geometric and semantic constraints. The generated environments are deployed in a virtual reality setting to facilitate HRI-in-the-loop, where user interactions provide continuous feedback to align generated content with human perception and usability. By tightly coupling generation and interaction, the proposed framework enables more responsive, adaptive, and realistic multimedia experiences. Experiments on the ALFRED benchmark demonstrate state-of-the-art performance in task-based scene generation. Furthermore, qualitative results and user studies show consistent improvements in immersion, interaction quality, and task efficiency, highlighting the importance of closed-loop integration of generation and interaction for next-generation multimedia systems. Our project page can be found at https://proj-showcase.github.io/h3ds/.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は言語駆動型3Dコンテンツ生成を大幅に改善しているが、既存のアプローチの多くはシーン生成とユーザインタラクションを別々のプロセスとして扱い、インタラクティブなマルチメディアシステムの適応性と没入可能性を制限する。本稿では,言語駆動型3Dシーン生成と没入型ユーザインタラクションのループを閉じる統一フレームワークを提案する。自然言語命令を与えられたシステムは、まずLLMを用いて構造化されたシーン表現を構築し、次に幾何学的制約と意味論的制約の下で強化学習を通じて空間レイアウトを最適化する。生成された環境は、HRI-in-the-loopを促進するために仮想現実環境にデプロイされ、ユーザインタラクションは、生成されたコンテンツを人間の知覚とユーザビリティに合わせるために、継続的なフィードバックを提供する。生成とインタラクションを緊密に結合することにより、提案フレームワークはより応答性が高く、適応性があり、リアルなマルチメディア体験を可能にする。 ALFREDベンチマークの実験では、タスクベースのシーン生成における最先端のパフォーマンスが示されている。さらに、定性的な結果とユーザスタディにより、次世代マルチメディアシステムにおける生成とインタラクションのクローズドループ統合の重要性が強調され、浸漬、相互作用品質、タスク効率が一貫した改善が見られた。私たちのプロジェクトページはhttps://proj-showcase.github.io/h3ds/。

論文の概要: Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling

関連論文リスト