Fugu-MT 論文翻訳(概要): SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement

論文の概要: SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement

arxiv url: http://arxiv.org/abs/2603.14076v1
Date: Sat, 14 Mar 2026 18:45:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.585217
Title: SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement
Title（参考訳）: SGR-OCC:Soft-Gating LiftingとSemantic-Adaptive Geometric Refinementによる身体的3次元活動予測のための単眼前駆体の開発
Authors: Yiran Guo, Simone Mentasti, Xiaofeng Jin, Matteo Frosi, Matteo Matteucci,
Abstract要約: 3Dセマンティック占有予測は、具現化されたAIの基盤である。我々は,SGR-OCC(Soft-Gating and Ray-Refinement Occupancy)を提案する。局所予測タスクでは、SGR-OCCは58.55$%の完了IoUと49.89$%のセマンティックmIoUを達成し、それぞれ3.65$%と3.69$%の前のベストメソッドであるEmbodiedOcc++を上回っている。
参考スコア（独自算出の注目度）: 9.891265334631889
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D semantic occupancy prediction is a cornerstone for embodied AI, enabling agents to perceive dense scene geometry and semantics incrementally from monocular video streams. However, current online frameworks face two critical bottlenecks: the inherent depth ambiguity of monocular estimation that causes "feature bleeding" at object boundaries , and the "cold start" instability where uninitialized temporal fusion layers distort high-quality spatial priors during early training stages. In this paper, we propose SGR-OCC (Soft-Gating and Ray-refinement Occupancy), a unified framework driven by the philosophy of "Inheritance and Evolution". To perfectly inherit monocular spatial expertise, we introduce a Soft-Gating Feature Lifter that explicitly models depth uncertainty via a Gaussian gate to probabilistically suppress background noise. Furthermore, a Dynamic Ray-Constrained Anchor Refinement module simplifies complex 3D displacement searches into efficient 1D depth corrections along camera rays, ensuring sub-voxel adherence to physical surfaces. To ensure stable evolution toward temporal consistency, we employ a Two-Phase Progressive Training Strategy equipped with identity-initialized fusion, effectively resolving the cold start problem and shielding spatial priors from noisy early gradients. Extensive experiments on the EmbodiedOcc-ScanNet and Occ-ScanNet benchmarks demonstrate that SGR-OCC achieves state-of-the-art performance. In local prediction tasks, SGR-OCC achieves a completion IoU of 58.55$\%$ and a semantic mIoU of 49.89$\%$, surpassing the previous best method, EmbodiedOcc++, by 3.65$\%$ and 3.69$\%$ respectively. In challenging embodied prediction tasks, our model reaches 55.72$\%$ SC-IoU and 46.22$\%$ mIoU. Qualitative results further confirm our model's superior capability in preserving structural integrity and boundary sharpness in complex indoor environments.
Abstract（参考訳）: 3Dセマンティック占有予測はAIの具体化の基礎であり、エージェントはモノクロビデオストリームから密集したシーンの幾何学とセマンティックスを段階的に知覚することができる。しかし、現在のオンラインフレームワークは、2つの重要なボトルネックに直面している: オブジェクト境界における「機能的出血」を引き起こす単分子推定の内在的な深さの曖昧さと、未初期化時間融合層が初期の訓練段階で高品質な空間的先行を歪ませる「コールドスタート」不安定性である。本稿では,SGR-OCC(Soft-Gating and Ray-Refinement Occupancy)を提案する。単分子的空間的専門知識を完全に継承するために,ガウスゲートによる奥行き不確かさを明示的にモデル化し,背景雑音を確率的に抑制するソフトゲイティング・フィーチャー・リフタを導入する。さらに、ダイナミック・レイ制約アンカー・リファインメント・モジュールは、複雑な3次元変位探索をカメラ線に沿った効率的な1次元深度補正に単純化し、物理的表面へのサブボクセルの付着を確保する。時間的整合性に向けた安定的な進化を確保するため,恒常的核融合を具備した2相プログレッシブ・トレーニング・ストラテジーを採用し,冷間開始問題を効果的に解決し,騒音の多い初期勾配から空間的先行を遮蔽する。 EmbodiedOcc-ScanNetベンチマークとOcc-ScanNetベンチマークの大規模な実験は、SGR-OCCが最先端のパフォーマンスを達成することを示した。局所予測タスクでは、SGR-OCCは58.55$\%$の完了IoUと49.89$\%$のセマンティックmIoUを達成し、それぞれ3.65$\%$と3.69$\%$の前のベストメソッドであるEmbodiedOcc++を上回ります。挑戦的な具体的予測タスクにおいて、我々のモデルは 55.72$\%$ SC-IoU と 46.22$\%$ mIoU に達する。複雑な屋内環境における構造的整合性と境界のシャープ性を維持する上で, モデルが優れていることを, 定性的に検証した。

論文の概要: SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement

関連論文リスト