Fugu-MT 論文翻訳(概要): LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

論文の概要: LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

arxiv url: http://arxiv.org/abs/2509.05263v2
Date: Mon, 08 Sep 2025 17:05:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.416155
Title: LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Title（参考訳）: LatticeWorld: インタラクティブな複合世界生成のためのマルチモーダルな大規模言語モデル駆動フレームワーク
Authors: Yinglin Duan, Zhengxia Zou, Tongwei Gu, Wei Jia, Zhan Zhao, Luyi Xu, Xinzhu Liu, Yenan Lin, Hao Jiang, Kang Chen, Shuang Qiu,
Abstract要約: 本稿では,3D環境の産業生産パイプラインを効率化する,シンプルで効果的な3Dワールドジェネレーションフレームワークを提案する。 LatticeWorldは、競合するマルチエージェントインタラクションを特徴とする、動的エージェントを備えた大規模な3Dインタラクティブワールドを生成する。 LatticeWorldは90倍以上の工業生産効率の向上を実現している。
参考スコア（独自算出の注目度）: 35.4193352348583
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a $90\times$ increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods. Our demo video is available at https://youtu.be/8VWZXpERR18
Abstract（参考訳）: 最近の研究は、複雑な現実世界のシナリオをシミュレートする3Dワールドモデルの開発に注力している。世界モデルは、エンボディAI、自律運転、エンターテイメントなど、さまざまな分野に広範に応用されている。正確な物理学によるより現実的なシミュレーションは、シモン・トゥ・リアルのギャップを効果的に狭め、実世界に関する豊富な情報を便利に収集することを可能にする。従来の手動モデリングは仮想3Dシーンの作成を可能にする一方で、現代のアプローチでは、高度な機械学習アルゴリズムを3Dワールドジェネレーションに活用している。この研究は、3D環境の産業生産パイプラインを効率化するシンプルで効果的な3DワールドジェネレーションフレームワークであるLatticeWorldを提案することで、そのような研究の方向性を探る。 LatticeWorldは、業界グレードのレンダリングエンジン(例えばUnreal Engine 5)と共に軽量LLM(LLaMA-2-7B)を活用し、動的環境を生成する。提案フレームワークは,テキスト記述と視覚的指示をマルチモーダル入力として受け入れ,動的エージェントを用いた大規模3次元インタラクティブな世界を作成し,競合するマルチエージェントインタラクション,高忠実度物理シミュレーション,リアルタイムレンダリングを特徴とする。我々はLatticeWorldを評価するための総合的な実験を行い、シーンレイアウト生成と視覚的忠実度において優れた精度が得られることを示した。さらに、LatticeWorldは、従来の手作業による生産方法と比較して高い創造的品質を維持しながら、工業生産効率の90\times$上昇を達成する。私たちのデモビデオはhttps://youtu.be/8VWZXpERR18で公開されています。

論文の概要: LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

関連論文リスト