Related papers: MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo

MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo

URL: http://arxiv.org/abs/2510.25221v1
Date: Wed, 29 Oct 2025 06:56:30 GMT
Title: MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo
Authors: Shiyu Qin, Zhihao Cai, Kaixuan Wang, Lin Qi, Junyu Dong,
Abstract summary: Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions.<n>Existing learning-based approaches often fail to accurately capture features at multiple stages and do not adequately promote interaction between these features.<n>We propose MSF-Net, a novel framework for extracting information at multiple stages, paired with selective update strategy.
Score: 38.34096529700518
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions. However, existing learning-based approaches often fail to accurately capture features at multiple stages and do not adequately promote interaction between these features. Consequently, these models tend to extract redundant features, especially in areas with intricate details such as wrinkles and edges. To tackle these issues, we propose MSF-Net, a novel framework for extracting information at multiple stages, paired with selective update strategy, aiming to extract high-quality feature information, which is critical for accurate normal construction. Additionally, we have developed a feature fusion module to improve the interplay among different features. Experimental results on the DiLiGenT benchmark show that our proposed MSF-Net significantly surpasses previous state-of-the-art methods in the accuracy of surface normal estimation.

Related papers

Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion [69.13852939945433]
Multi-Modal Image Fusion (MMIF) aims to combine images from different modalities to produce fused images.<n>We propose a novel Interactive Spatial-Frequency Fusion Mamba framework for MMIF.<n>Our ISFM can achieve better performances than other state-of-the-art methods.
arXiv Detail & Related papers (2026-02-04T10:35:55Z)
MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z)
PIF-Net: Ill-Posed Prior Guided Multispectral and Hyperspectral Image Fusion via Invertible Mamba and Fusion-Aware LoRA [0.16385815610837165]
The goal of multispectral and hyperspectral image fusion (MHIF) is to generate high-quality images that simultaneously possess rich spectral information and fine spatial details.<n>Previous studies have not effectively addressed the ill-posed nature caused by data misalignment.<n>We propose a fusion framework named PIF-Net, which explicitly incorporates ill-posed priors to effectively fuse multispectral images and hyperspectral images.
arXiv Detail & Related papers (2025-08-01T09:17:17Z)
Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches. In this paper, we propose a model-based deep unfolded method for satellite image fusion. Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation [2.491548070992611]
novel multi-modal fusion approach called CSK-Net is proposed. It uses a contrastive learning-based spectral knowledge distillation technique. Experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities.
arXiv Detail & Related papers (2023-12-04T10:27:09Z)
Multi-View Photometric Stereo Revisited [100.97116470055273]
Multi-view photometric stereo (MVPS) is a preferred method for detailed and precise 3D acquisition of an object from images. We present a simple, practical approach to MVPS, which works well for isotropic as well as other object material types such as anisotropic and glossy. The proposed approach shows state-of-the-art results when tested extensively on several benchmark datasets.
arXiv Detail & Related papers (2022-10-14T09:46:15Z)
Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object Tracking [23.130490413184596]
We introduce PointNet++ to obtain multi-scale deep representations of point cloud to make it adaptive to our proposed Interactive Feature Fusion. Our method can achieve good performance on the KITTI benchmark and outperform other approaches without using multi-scale feature fusion.
arXiv Detail & Related papers (2022-03-30T13:00:27Z)
DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)
Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing [31.543771270803056]
In this paper, we pay attention to the feature extraction and utilization of the input image itself. We propose a Multi-scale Topological Network (MSTN) to fully explore the features at different scales. Meanwhile, we design a Multi-scale Feature Fusion Module (MFFM) and an Adaptive Feature Selection Module (AFSM) to achieve the selection and fusion of features at different scales.
arXiv Detail & Related papers (2021-02-24T08:53:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.