Related papers: Revisiting Shape from Polarization in the Era of Vision Foundation Models

Revisiting Shape from Polarization in the Era of Vision Foundation Models

URL: http://arxiv.org/abs/2603.04817v1
Date: Thu, 05 Mar 2026 05:07:03 GMT
Title: Revisiting Shape from Polarization in the Era of Vision Foundation Models
Authors: Chenhao Li, Taishi Ono, Takeshi Uemori, Yusuke Moriuchi,
Abstract summary: We show that a lightweight model trained on a small dataset can outperform RGB-only vision foundation models (VFMs) in single-shot object-level surface normal estimation.<n>With only 40K training scenes, our method significantly outperforms both state-of-the-art SfP approaches and RGB-only VFMs.
Score: 11.779432473091754
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show that, with polarization cues, a lightweight model trained on a small dataset can outperform RGB-only vision foundation models (VFMs) in single-shot object-level surface normal estimation. Shape from polarization (SfP) has long been studied due to the strong physical relationship between polarization and surface geometry. Meanwhile, driven by scaling laws, RGB-only VFMs trained on large datasets have recently achieved impressive performance and surpassed existing SfP methods. This situation raises questions about the necessity of polarization cues, which require specialized hardware and have limited training data. We argue that the weaker performance of prior SfP methods does not come from the polarization modality itself, but from domain gaps. These domain gaps mainly arise from two sources. First, existing synthetic datasets use limited and unrealistic 3D objects, with simple geometry and random texture maps that do not match the underlying shapes. Second, real-world polarization signals are often affected by sensor noise, which is not well modeled during training. To address the first issue, we render a high-quality polarization dataset using 1,954 3D-scanned real-world objects. We further incorporate pretrained DINOv3 priors to improve generalization to unseen objects. To address the second issue, we introduce polarization sensor-aware data augmentation that better reflects real-world conditions. With only 40K training scenes, our method significantly outperforms both state-of-the-art SfP approaches and RGB-only VFMs. Extensive experiments show that polarization cues enable a 33x reduction in training data or an 8x reduction in model parameters, while still achieving better performance than RGB-only counterparts.

Related papers

Shape from Polarization of Thermal Emission and Reflection [2.7317088388886384]
We leverage the Shape from Polarization (SfP) technique in the Long-Wave Infrared (LWIR) spectrum, where most materials are opaque and emissive.<n>We formulated a polarization model that explicitly accounts for the combined effects of emission and reflection.<n>We implemented a prototype system and created ThermoPol, the first real-world benchmark dataset for LWIR SfP.
arXiv Detail & Related papers (2025-06-23T00:33:17Z)
GratNet: A Photorealistic Neural Shader for Diffractive Surfaces [0.0]
We present a multi-layer perceptron (MLP) based method for data-driven rendering of diffractive surfaces with high accuracy and efficiency.<n>We demonstrate the high-quality reconstruction of the ground-truth using Peak-Signal-to-Noise (PSNR), Structural Similarity Index Measure (SSIM) and a flipping difference evaluator (FLIP) as evaluation metrics.
arXiv Detail & Related papers (2025-06-18T18:58:00Z)
Polar Coordinate-Based 2D Pose Prior with Neural Distance Field [0.34952465649465553]
We propose a 2D pose prior-guided refinement approach based on Neural Distance Fields (NDF)<n>We introduce a polar coordinate-based representation that explicitly incorporates joint connection lengths, enabling a more accurate correction of erroneous pose estimations.<n>Our method is evaluated on a long jump dataset, demonstrating its ability to improve 2D pose estimation across multiple pose representations.
arXiv Detail & Related papers (2025-05-06T11:31:14Z)
Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [65.42565481489132]
Humans can easily deduce the relative pose of a previously unseen object, without labeling or training, given only a single query-reference image pair.<n>We propose a novel 3D generalizable relative pose estimation method by elaborating 3D/2.5D shape perception with a 2.5D shape from RGB-D reference.<n>Our differentiable takes the 2.5D rotatable mesh textured by the RGB and the semantic maps (obtained by DINOv2 from the RGB input), then renders new RGB and semantic maps under a novel rotated view.
arXiv Detail & Related papers (2024-06-26T16:01:10Z)
Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors. To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.<n>We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM [58.736472371951955]
We introduce a ternary-type opacity (TT) model, which categorizes points on a ray intersecting a surface into three regions: before, on, and behind the surface. This enables a more accurate rendering of depth, subsequently improving the performance of image warping techniques. Our integrated approach of TT and HO achieves state-of-the-art performance on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-20T18:03:17Z)
Learning a 3D Morphable Face Reflectance Model from Low-cost Data [21.37535100469443]
Existing works build parametric models for diffuse and specular albedo using Light Stage data. This paper proposes the first 3D morphable face reflectance model with spatially varying BRDF using only low-cost publicly-available data.
arXiv Detail & Related papers (2023-03-21T09:08:30Z)
{\phi}-SfT: Shape-from-Template with a Physics-Based Deformation Model [69.27632025495512]
Shape-from-Template (SfT) methods estimate 3D surface deformations from a single monocular RGB camera. This paper proposes a new SfT approach explaining 2D observations through physical simulations.
arXiv Detail & Related papers (2022-03-22T17:59:57Z)
Shape from Polarization for Complex Scenes in the Wild [93.65746187211958]
We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image. We contribute the first real-world scene-level SfP dataset with paired input polarization images and ground-truth normal maps. Our trained model can be generalized to far-field outdoor scenes as the relationship between polarized light and surface normals is not affected by distance.
arXiv Detail & Related papers (2021-12-21T17:30:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.