Fugu-MT 論文翻訳(概要): Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

論文の概要: Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

arxiv url: http://arxiv.org/abs/2509.08388v1
Date: Wed, 10 Sep 2025 08:29:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.359786
Title: Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Title（参考訳）: 意味的因果性を考慮した視覚に基づく3次元活動予測
Authors: Dubing Chen, Huan Zheng, Yucheng Zhou, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen,
Abstract要約: 視覚に基づく3Dセマンティック占有予測は、3Dビジョンにおいて重要な課題である。しかし、既存のメソッドは、しばしばモジュラーパイプラインに依存している。本稿では,モジュール型2D-to-3Dトランスフォーメーションパイプラインの全体的,エンドツーエンドの監視を可能にする新たな因果損失を提案する。
参考スコア（独自算出の注目度）: 63.752869043357585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-based 3D semantic occupancy prediction is a critical task in 3D vision that integrates volumetric 3D reconstruction with semantic understanding. Existing methods, however, often rely on modular pipelines. These modules are typically optimized independently or use pre-configured inputs, leading to cascading errors. In this paper, we address this limitation by designing a novel causal loss that enables holistic, end-to-end supervision of the modular 2D-to-3D transformation pipeline. Grounded in the principle of 2D-to-3D semantic causality, this loss regulates the gradient flow from 3D voxel representations back to the 2D features. Consequently, it renders the entire pipeline differentiable, unifying the learning process and making previously non-trainable components fully learnable. Building on this principle, we propose the Semantic Causality-Aware 2D-to-3D Transformation, which comprises three components guided by our causal loss: Channel-Grouped Lifting for adaptive semantic mapping, Learnable Camera Offsets for enhanced robustness against camera perturbations, and Normalized Convolution for effective feature propagation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the Occ3D benchmark, demonstrating significant robustness to camera perturbations and improved 2D-to-3D semantic consistency.
Abstract（参考訳）: 視覚に基づく3Dセマンティック占有予測は、ボリューム3D再構成とセマンティック理解を統合した3Dビジョンにおいて重要な課題である。しかし、既存のメソッドは、しばしばモジュラーパイプラインに依存している。これらのモジュールは通常、独立して最適化されるか、事前設定された入力を使用し、カスケードエラーを引き起こす。本稿では,モジュール型2D-to-3Dトランスフォーメーションパイプラインの全体的,エンドツーエンドの監視を可能にする新たな因果損失を設計することで,この制限に対処する。 2D-to-3Dセマンティック因果性の原理に基づいて、この損失は3Dボクセル表現から2D特徴への勾配流れを制御する。その結果、パイプライン全体を差別化し、学習プロセスを統一し、それまでのトレーニング不可能なコンポーネントを完全に学習可能にする。本原理に基づくセマンティック因果2D-to-3D変換は,適応的意味マッピングのためのチャネルグループリフティング,カメラ摂動に対する堅牢性向上のための学習可能なカメラオフセット,効果的な特徴伝達のための正規化畳み込みの3つの要素から構成される。 Occ3Dベンチマークでは, カメラの摂動に頑健で, 2次元から3次元のセマンティック一貫性が向上した。

論文の概要: Semantic Causality-Aware Vision-Based 3D Occupancy Prediction

関連論文リスト