Fugu-MT 論文翻訳(概要): UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

論文の概要: UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

arxiv url: http://arxiv.org/abs/2603.17519v1
Date: Wed, 18 Mar 2026 09:26:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.599207
Title: UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images
Title（参考訳）: UniSem:スパース・アンポーズ画像から汎用的なセマンティックな3D再構成
Authors: Guibiao Liao, Qian Ren, Kaimin Liao, Hua Wang, Zhi Chen, Luchao Wang, Yaohua Tang,
Abstract要約: 2つのキーコンポーネントによる深度精度とセマンティックな一般化を改善する統合フレームワークUniSemを提案する。 Error-aware Gaussian Dropout (EGD) は冗長性のあるGaussianを抑えることでエラー誘導容量制御を行う。第2に、2Dセグメンタリフトセマンティクスと、モデル独自の創発的な3DセマンティクスをブレンドするMix-training Curriculum(MTC)を導入する。
参考スコア（独自算出の注目度）: 10.080087958100552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic-aware 3D reconstruction from sparse, unposed images remains challenging for feed-forward 3D Gaussian Splatting (3DGS). Existing methods often predict an over-complete set of Gaussian primitives under sparse-view supervision, leading to unstable geometry and inferior depth quality. Meanwhile, they rely solely on 2D segmenter features for semantic lifting, which provides weak 3D-level and limited generalizable supervision, resulting in incomplete 3D semantics in novel scenes. To address these issues, we propose UniSem, a unified framework that jointly improves depth accuracy and semantic generalization via two key components. First, Error-aware Gaussian Dropout (EGD) performs error-guided capacity control by suppressing redundancy-prone Gaussians using rendering error cues, producing meaningful, geometrically stable Gaussian representations for improved depth estimation. Second, we introduce a Mix-training Curriculum (MTC) that progressively blends 2D segmenter-lifted semantics with the model's own emergent 3D semantic priors, implemented with object-level prototype alignment to enhance semantic coherence and completeness. Extensive experiments on ScanNet and Replica show that UniSem achieves superior performance in depth prediction and open-vocabulary 3D segmentation across varying numbers of input views. Notably, with 16-view inputs, UniSem reduces depth Rel by 15.2% and improves open-vocabulary segmentation mAcc by 3.7% over strong baselines.
Abstract（参考訳）: フィードフォワード3Dガウススプラッティング(3DGS)では,スパース画像からのセマンティック3D再構成が困難である。既存の手法はしばしばスパースビューの監督の下でガウス原始体の過剰完全集合を予測し、不安定な幾何学と下層の深さ品質をもたらす。一方、セマンティックリフティングには2次元セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティクスのみに依存しており、3次元レベルの弱さと限定的な汎用性を提供し、新しいシーンでは不完全な3次元セマンティクスをもたらす。これらの問題に対処するために,2つのキーコンポーネントによる深度精度とセマンティック・ジェネリゼーションを共同で改善する統合フレームワークUniSemを提案する。まず,誤差を意識したガウス落下(EGD)は,レンダリングエラーキューを用いて冗長性の高いガウスを抑え,意味のある幾何的に安定なガウス表現を生成し,深度推定を改善することによって誤差誘導容量制御を行う。第2に、2Dセグメンタリフトセマンティクスとモデル独自の創発的3Dセマンティクスを段階的にブレンドする混合学習カリキュラム(MTC)を導入し、セマンティクスの一貫性と完全性を高めるためにオブジェクトレベルのプロトタイプアライメントを実装した。 ScanNetとReplicaの大規模な実験により、UniSemは様々な入力ビューに対して深度予測とオープンボキャブラリ3Dセグメンテーションにおいて優れた性能を発揮することが示された。特に16ビューの入力では、UniSemは深さRelを15.2%減らし、オープン語彙のセグメンテーションmAccを3.7%改善する。

論文の概要: UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

関連論文リスト