Fugu-MT 論文翻訳(概要): O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

論文の概要: O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

arxiv url: http://arxiv.org/abs/2603.12144v1
Date: Thu, 12 Mar 2026 16:45:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.224486
Title: O3N: Omnidirectional Open-Vocabulary Occupancy Prediction
Title（参考訳）: O3N:Omnidirectional Open-Vocabulary Occupancy Prediction
Authors: Mengfei Duan, Hao Shi, Fei Teng, Guoqiang Zhao, Yuheng Zhang, Zhiyong Li, Kailun Yang,
Abstract要約: O3N(Omnidirectional Omnidirectional Open-vocabulary Occupancy predictioN framework)について述べる。 O3Nは全方位のボクセルを極性-スピラルトポロジーに埋め込み、連続的な空間表現と長距離コンテキストモデリングを可能にする。提案手法は,QuadOccおよびHuman360Occベンチマーク上での最先端性能を実現する。
参考スコア（独自算出の注目度）: 31.91030387170798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding and reconstructing the 3D world through omnidirectional perception is an inevitable trend in the development of autonomous agents and embodied intelligence. However, existing 3D occupancy prediction methods are constrained by limited perspective inputs and predefined training distribution, making them difficult to apply to embodied agents that require comprehensive and safe perception of scenes in open world exploration. To address this, we present O3N, the first purely visual, end-to-end Omnidirectional Open-vocabulary Occupancy predictioN framework. O3N embeds omnidirectional voxels in a polar-spiral topology via the Polar-spiral Mamba (PsM) module, enabling continuous spatial representation and long-range context modeling across 360°. The Occupancy Cost Aggregation (OCA) module introduces a principled mechanism for unifying geometric and semantic supervision within the voxel space, ensuring consistency between the reconstructed geometry and the underlying semantic structure. Moreover, Natural Modality Alignment (NMA) establishes a gradient-free alignment pathway that harmonizes visual features, voxel embeddings, and text semantics, forming a consistent "pixel-voxel-text" representation triad. Extensive experiments on multiple models demonstrate that our method not only achieves state-of-the-art performance on QuadOcc and Human360Occ benchmarks but also exhibits remarkable cross-scene generalization and semantic scalability, paving the way toward universal 3D world modeling. The source code will be made publicly available at https://github.com/MengfeiD/O3N.
Abstract（参考訳）: 全方位知覚による3次元世界理解と再構築は、自律エージェントや具体的知能の開発において必然的な傾向である。しかし、既存の3D占有予測手法は、限られた視点入力と事前定義された訓練分布によって制約されるため、オープンワールド探索におけるシーンの包括的かつ安全な認識を必要とするエンボディエージェントに適用することは困難である。この問題に対処するため、我々はO3N(Omnidirectional Omnidirectional Open-vocabulary Occupancy predictioN)フレームワークを提示する。 O3Nは極性スピラルマンバ(Parlar-spiral Mamba, PsM)モジュールを介して極性スピラルトポロジーに全方位ボクセルを埋め込み、360度にわたって連続的な空間表現と長距離コンテキストモデリングを可能にする。 Occupancy Cost Aggregation (OCA)モジュールは、ボクセル空間内の幾何学的および意味的監督を統一し、再構成された幾何学と基盤となる意味構造との整合性を確保するための原則的なメカニズムを導入している。さらに、Natural Modality Alignment (NMA)は、視覚的特徴、ボクセル埋め込み、テキスト意味論を調和させる勾配のないアライメントパスを確立し、一貫した「ピクセル・ボクセルテキスト」表現トリードを形成する。提案手法は,QuadOccおよびHuman360Occベンチマーク上での最先端性能を実現するだけでなく,クロスシーンな一般化とセマンティックな拡張性を示し,普遍的な3次元世界モデリングへの道を開いた。ソースコードはhttps://github.com/MengfeiD/O3Nで公開されている。

論文の概要: O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

関連論文リスト