Fugu-MT 論文翻訳(概要): HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

論文の概要: HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

arxiv url: http://arxiv.org/abs/2604.28196v1
Date: Thu, 30 Apr 2026 17:59:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:54.254097
Title: HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Title（参考訳）: HERMES++:3次元シーン理解と生成のための統一運転世界モデルを目指して
Authors: Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai,
Abstract要約: HERMES++は、単一のフレームワーク内で3Dシーン理解と将来の幾何学的予測を統合する統合駆動世界モデルである。 Hermes++は、将来のクラウド予測と3Dシーン理解タスクの両方において、優れたパフォーマンスと優れたスペシャリストのアプローチを実現している。
参考スコア（独自算出の注目度）: 83.31948299340782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reasoning capabilities, they lack the capacity to predict future geometric evolution, creating a significant disparity between semantic interpretation and physical simulation. To bridge this gap, we propose HERMES++, a unified driving world model that integrates 3D scene understanding and future geometry prediction within a single framework. Our approach addresses the distinct requirements of these tasks through synergistic designs. First, a BEV representation consolidates multi-view spatial information into a structure compatible with LLMs. Second, we introduce LLM-enhanced world queries to facilitate knowledge transfer from the understanding branch. Third, a Current-to-Future Link is designed to bridge the temporal gap, conditioning geometric evolution on semantic context. Finally, to enforce structural integrity, we employ a Joint Geometric Optimization strategy that integrates explicit geometric constraints with implicit latent regularization to align internal representations with geometry-aware priors. Extensive evaluations on multiple benchmarks validate the effectiveness of our method. HERMES++ achieves strong performance, outperforming specialist approaches in both future point cloud prediction and 3D scene understanding tasks. The model and code will be publicly released at https://github.com/H-EmbodVis/HERMESV2.
Abstract（参考訳）: ドライビングワールドモデルは、環境力学をシミュレートすることで、自動運転のための重要な技術として機能する。しかし、既存のアプローチは主に将来のシーン生成に焦点を当てており、しばしば包括的な3Dシーン理解を見落としている。逆に、Large Language Models (LLMs) は印象的な推論能力を示しているが、将来の幾何学的進化を予測する能力は欠如しており、意味論的解釈と物理シミュレーションの間に大きな相違が生じている。このギャップを埋めるため,HERMES++を提案する。HERMES++は3次元シーン理解と将来の幾何学的予測を単一のフレームワークに統合した統合駆動世界モデルである。提案手法は, 相乗的設計により, これらの課題の異なる要件に対処する。まず、BEV表現は、多視点空間情報をLLMと互換性のある構造に集約する。第2に,LLMで拡張された世界クエリを導入し,理解部からの知識伝達を容易にする。第3に、Current-to-Future Linkは時間的ギャップを埋めるために設計されており、意味的文脈に幾何学的進化を条件付けている。最後に、構造的整合性を強化するために、明示的な幾何学的制約と暗黙的な潜在正規化を統合し、内部表現と幾何学的事前を整合させる統合幾何最適化戦略を用いる。複数のベンチマークで大規模な評価を行い,本手法の有効性を検証した。 HERMES++は、将来のクラウド予測と3Dシーン理解タスクの両方において、優れたパフォーマンスと優れたスペシャリストのアプローチを実現している。モデルとコードはhttps://github.com/H-EmbodVis/HERMESV2.comで公開される。

論文の概要: HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

関連論文リスト