Fugu-MT 論文翻訳(概要): Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion

論文の概要: Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion

arxiv url: http://arxiv.org/abs/2602.03371v1
Date: Tue, 03 Feb 2026 10:46:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.397027
Title: Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion
Title（参考訳）: カメラによる3次元セマンティック・シーン・コンプリートにおけるVoxel Sparsityの多分解能アライメント
Authors: Zhiwen Yang, Yuxin Peng,
Abstract要約: カメラベースの3Dセマンティックシーン補完(SSC)は、周囲の3Dシーンにおける各ボクセルの幾何学的占有度と意味ラベルを画像入力で評価するためのコスト効率の良いソリューションを提供する。既存の手法は、自律運転シナリオにおけるボクセルの大部分が空であるので、ボクセルの空間性という課題に直面している。カメラを用いた3Dセマンティックシーン補完におけるボクセル空間の分散を緩和するために,textitMulti-Resolution Alignment (MRA) アプローチを提案する。
参考スコア（独自算出の注目度）: 52.959716866316604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with image inputs, providing a voxel-level scene perception foundation for the perception-prediction-planning autonomous driving systems. Although significant progress has been made in existing methods, their optimization rely solely on the supervision from voxel labels and face the challenge of voxel sparsity as a large portion of voxels in autonomous driving scenarios are empty, which limits both optimization efficiency and model performance. To address this issue, we propose a \textit{Multi-Resolution Alignment (MRA)} approach to mitigate voxel sparsity in camera-based 3D semantic scene completion, which exploits the scene and instance level alignment across multi-resolution 3D features as auxiliary supervision. Specifically, we first propose the Multi-resolution View Transformer module, which projects 2D image features into multi-resolution 3D features and aligns them at the scene level through fusing discriminative seed features. Furthermore, we design the Cubic Semantic Anisotropy module to identify the instance-level semantic significance of each voxel, accounting for the semantic differences of a specific voxel against its neighboring voxels within a cubic area. Finally, we devise a Critical Distribution Alignment module, which selects critical voxels as instance-level anchors with the guidance of cubic semantic anisotropy, and applies a circulated loss for auxiliary supervision on the critical feature distribution consistency across different resolutions. The code is available at https://github.com/PKU-ICST-MIPL/MRA_TIP.
Abstract（参考訳）: カメラベースの3Dセマンティックシーンコンプリート(SSC)は、周囲の3Dシーンにおける各ボクセルの幾何学的占有度とセマンティックラベルを画像入力で評価するためのコスト効率の良いソリューションを提供し、知覚予測計画自律運転システムのためのボクセルレベルのシーン認識基盤を提供する。従来の手法では大きな進歩があったが、その最適化はボクセルラベルの監督にのみ依存しており、自律運転シナリオにおけるボクセルの大部分が空であり、最適化効率とモデル性能の両方に制限があるため、ボクセルの分散性の課題に直面している。この問題に対処するために,カメラベースの3Dセマンティックシーン補完におけるボクセル空間の分散を緩和する,<textit{Multi-Resolution Alignment (MRA) アプローチを提案する。具体的には,まず2次元画像特徴を多解像度3次元特徴に投影し,識別的シード特徴を融合させてシーンレベルで整列させる多分解能ビュートランスフォーマーモジュールを提案する。さらに,キュービックセマンティックな異方性モジュールを設計し,各ボクセルのインスタンスレベルの意味的意義を同定し,隣接するボクセルと隣接するボクセルとのセマンティックな差異を考慮に入れた。最後に, 臨界ボクセルをインスタンスレベルのアンカーとして, 立体的意味異方性の誘導により選択する臨界分布アライメントモジュールを考案し, 様々な解像度における臨界特性分布の整合性について, 補助的監督のために循環的損失を適用した。コードはhttps://github.com/PKU-ICST-MIPL/MRA_TIPで入手できる。

論文の概要: Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion

関連論文リスト