Fugu-MT 論文翻訳(概要): CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

論文の概要: CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

arxiv url: http://arxiv.org/abs/2604.11097v1
Date: Mon, 13 Apr 2026 07:12:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.389623
Title: CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation
Title（参考訳）: CDPR:信頼性の高い単眼深度推定のための偏光を用いたクロスモーダル拡散
Authors: Rongjia Yu, Tong Jia, Hao Wang, Xiaofang Li, Xiao Yang, Zinuo Zhang, Cuiwei Liu,
Abstract要約: CDPRは、推定ロバスト性を高めるために物理的に接地された偏光前処理を統合する新しい拡散ベースのフレームワークである。また,CDPRは,標準シーンにおける競争性能を維持しつつ,挑戦的な領域においてRGBのみのベースラインを著しく上回ることを示す。
参考スコア（独自算出の注目度）: 12.658602122161989
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular depth estimation is a fundamental yet challenging task in computer vision, especially under complex conditions such as textureless surfaces, transparency, and specular reflections. Recent diffusion-based approaches have significantly advanced performance by reformulating depth prediction as a denoising process in the latent space. However, existing methods rely solely on RGB inputs, which often lack sufficient cues in challenging regions. In this work, we present CDPR - Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation - a novel diffusion-based framework that integrates physically grounded polarization priors to enhance estimation robustness. Specifically, we encode both RGB and polarization (AoLP/DoLP) images into a shared latent space via a pre-trained Variational Autoencoder (VAE), and dynamically fuse multi-modal information through a learnable confidence-aware gating mechanism. This fusion module adaptively suppresses noisy signals in polarization inputs while preserving informative cues, particularly around reflective or transparent surfaces, and provides the integrated latent representation for subsequent monocular depth estimation. Beyond depth estimation, we further verify that our framework can be easily generalized to surface normal prediction with minimal modification, showcasing its scalability to general polarization-guided dense prediction tasks. Experiments on both synthetic and real-world datasets validate that CDPR significantly outperforms RGB-only baselines in challenging regions while maintaining competitive performance in standard scenes.
Abstract（参考訳）: 単眼深度推定はコンピュータビジョンにおける基本的な課題であり、特にテクスチャのない表面、透過性、特異反射のような複雑な条件下では難しい課題である。近年の拡散型アプローチは,潜伏空間におけるデノナイジング過程として深度予測を再構成することにより,顕著な性能向上を実現している。しかし、既存の手法はRGB入力のみに依存しており、しばしば挑戦する領域で十分な手がかりを欠いている。本稿では,CDPR-Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimationについて述べる。具体的には、RGBと偏光(AoLP/DoLP)の両方の画像を、事前訓練された変分オートエンコーダ(VAE)を介して共有潜在空間にエンコードし、学習可能な信頼認識ゲーティング機構によって動的にマルチモーダル情報を融合する。この融合モジュールは、特に反射面や透明表面の情報を保存しながら、偏光入力のノイズ信号を適応的に抑制し、その後の単分子深度推定のための統合潜在表現を提供する。深度推定以外にも、我々のフレームワークは最小限の修正で通常の予測に容易に一般化できることを検証し、そのスケーラビリティを一般化偏光誘導密度予測タスクに示す。合成と実世界の両方のデータセットの実験では、CDPRが標準シーンでの競争性能を維持しながら、挑戦する領域でRGBのみのベースラインを著しく上回っていることが確認された。

論文の概要: CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation

関連論文リスト