Fugu-MT 論文翻訳(概要): Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction

論文の概要: Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction

arxiv url: http://arxiv.org/abs/2412.08243v1
Date: Wed, 11 Dec 2024 09:53:10 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-12 23:20:26.632419
Title: Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction
Title（参考訳）: 意味的職業予測のための幾何学的・時間的モデリングによる階層的コンテキストアライメント
Authors: Bohan Li, Xin Jin, Jiajun Deng, Yasheng Sun, Xiaofeng Wang, Wenjun Zeng,
Abstract要約: カメラを用いた3Dセマンティック占領予測(SOP)は、限られた2次元画像観察から複雑な3Dシーンを理解するのに不可欠である。既存のSOPメソッドは通常、占有表現学習を支援するためにコンテキストの特徴を集約する。より正確なSOP(Hi-SOP)のための新しい階層型コンテキストアライメントパラダイムを導入する。
参考スコア（独自算出の注目度）: 61.484280369655536
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Camera-based 3D Semantic Occupancy Prediction (SOP) is crucial for understanding complex 3D scenes from limited 2D image observations. Existing SOP methods typically aggregate contextual features to assist the occupancy representation learning, alleviating issues like occlusion or ambiguity. However, these solutions often face misalignment issues wherein the corresponding features at the same position across different frames may have different semantic meanings during the aggregation process, which leads to unreliable contextual fusion results and an unstable representation learning process. To address this problem, we introduce a new Hierarchical context alignment paradigm for a more accurate SOP (Hi-SOP). Hi-SOP first disentangles the geometric and temporal context for separate alignment, which two branches are then composed to enhance the reliability of SOP. This parsing of the visual input into a local-global alignment hierarchy includes: (I) disentangled geometric and temporal separate alignment, within each leverages depth confidence and camera pose as prior for relevant feature matching respectively; (II) global alignment and composition of the transformed geometric and temporal volumes based on semantics consistency. Our method outperforms SOTAs for semantic scene completion on the SemanticKITTI & NuScenes-Occupancy datasets and LiDAR semantic segmentation on the NuScenes dataset.
Abstract（参考訳）: カメラを用いた3Dセマンティック占領予測(SOP)は、限られた2次元画像観察から複雑な3Dシーンを理解するのに不可欠である。既存のSOPメソッドは通常、コンテキスト的特徴を集約して、占有表現学習を支援し、隠蔽や曖昧さといった問題を緩和する。しかし、これらの解はしばしば、異なるフレームにまたがる同じ位置にある対応する特徴が集約プロセスの間に異なる意味を持つ可能性があるという不整合問題に直面し、信頼できない文脈融合結果と不安定な表現学習プロセスをもたらす。この問題に対処するために,より正確なSOP(Hi-SOP)のための階層型コンテキストアライメントパラダイムを導入する。 Hi-SOPは、まず幾何学的コンテキストと時間的コンテキストを分離して、SOPの信頼性を高めるために2つのブランチを構成する。局所的グローバルなアライメント階層への視覚的入力のパーシングは、 (I) 幾何的および時間的分離アライメントは、それぞれが関連する特徴マッチングに先立って、深さの信頼とカメラのポーズを活用する; (II) 意味論的整合性に基づいて変換された幾何学的および時間的ボリュームのグローバルなアライメントと構成を含む。本手法は,セマンティックKITTIとNuScenes-Occupancyデータセットのセマンティック・シーン補完とNuScenesデータセットのLiDARセマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティクスのセマンティック・セマンティック・

論文の概要: Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction

関連論文リスト