Fugu-MT 論文翻訳(概要): Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction

論文の概要: Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction

arxiv url: http://arxiv.org/abs/2603.05787v1
Date: Fri, 06 Mar 2026 00:35:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:44.793473
Title: Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction
Title（参考訳）: 2次元から3次元のシーン再構成における特徴アップサンプラーの分光探査
Authors: Ling Xiao, Yuliang Xiu, Yue Chen, Guoming Wang, Toshihiko Yamasaki,
Abstract要約: 近年の学習可能なアップサンプリング手法は,空間的詳細性を高めることを目的としている。 CLIPおよびDINOバックボーンの古典的および学習可能なアップサンプリング法について,3つの重要な知見を観察した。その結果, 復元品質は, 空間的詳細性を高めることよりも, スペクトル構造の保存と密接な関係があることが示唆された。
参考スコア（独自算出の注目度）: 41.21245865872482
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A typical 2D-to-3D pipeline takes multi-view images as input, where a Vision Foundation Model (VFM) extracts features that are spatially upsampled to dense representations for 3D reconstruction. If dense features across views preserve geometric consistency, differentiable rendering can recover an accurate 3D representation, making the feature upsampler a critical component. Recent learnable upsampling methods mainly aim to enhance spatial details, such as sharper geometry or richer textures, yet their impact on 3D awareness remains underexplored. To address this gap, we introduce a spectral diagnostic framework with six complementary metrics that characterize amplitude redistribution, structural spectral alignment, and directional stability. Across classical interpolation and learnable upsampling methods on CLIP and DINO backbones, we observe three key findings. First, structural spectral consistency (SSC/CSC) is the strongest predictor of NVS quality, whereas High-Frequency Spectral Slope Drift (HFSS) often correlates negatively with reconstruction performance, indicating that emphasizing high-frequency details alone does not necessarily improve 3D reconstruction. Second, geometry and texture respond to different spectral properties: Angular Energy Consistency (ADC) correlates more strongly with geometry-related metrics, while SSC/CSC influence texture fidelity slightly more than geometric accuracy. Third, although learnable upsamplers often produce sharper spatial features, they rarely outperform classical interpolation in reconstruction quality, and their effectiveness depends on the reconstruction model. Overall, our results indicate that reconstruction quality is more closely related to preserving spectral structure than to enhancing spatial detail, highlighting spectral consistency as an important principle for designing upsampling strategies in 2D-to-3D pipelines.
Abstract（参考訳）: 典型的な2D-to-3Dパイプラインはマルチビュー画像を入力として取り、視覚基礎モデル(VFM)は空間的に3D再構成のために密集した表現にマッピングされた特徴を抽出する。ビューにまたがる密集した機能が幾何的整合性を保つならば、差別化可能なレンダリングは正確な3D表現を復元し、特徴のアップサンプラーを重要なコンポーネントにする。最近の学習可能なアップサンプリング手法は主に、よりシャープな幾何学やよりリッチなテクスチャといった空間的詳細性を高めることを目的としている。このギャップに対処するために、振幅再分配、構造スペクトルアライメント、方向安定性を特徴付ける6つの相補的指標を持つスペクトル診断フレームワークを導入する。古典的補間法およびCLIPおよびDINOバックボーンの学習性アップサンプリング法により,3つの重要な所見が観察された。まず、構造スペクトル整合性(SSC/CSC)がNVS品質の最も強い予測因子であるのに対し、高周波数スペクトルスロープドリフト(HFSS)は、しばしば再構成性能と負の相関を示し、高周波詳細のみを強調することが必ずしも3D再構成を改善するとは限らないことを示す。第二に、幾何とテクスチャは異なるスペクトル特性に応答する: Angular Energy Consistency (ADC) は幾何に関連したメトリクスと強く相関し、SSC/CSCは幾何精度よりもわずかにテクスチャの忠実性に影響を与える。第三に、学習可能なアップサンプラーは、しばしばよりシャープな空間的特徴を生み出すが、古典的な補間を再現品質で上回ることは滅多になく、それらの効果は再構成モデルに依存する。以上の結果から,2次元から3次元パイプラインのアップサンプリング戦略設計における重要な原則として,コントラストの整合性を強調した。

論文の概要: Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction

関連論文リスト