RGB-Phase Speckle: Cross-Scene Stereo 3D Reconstruction via Wrapped Pre-Normalization
- URL: http://arxiv.org/abs/2503.06125v2
- Date: Thu, 17 Apr 2025 12:57:37 GMT
- Title: RGB-Phase Speckle: Cross-Scene Stereo 3D Reconstruction via Wrapped Pre-Normalization
- Authors: Kai Yang, Zijian Bai, Yang Xiao, Xinyu Li, Xiaohan Shi,
- Abstract summary: This study introduces RGB-Speckle, a cross-scene 3D reconstruction framework based on an active stereo camera system.<n>We propose a novel phase pre-normalization encoding-decoding method, which mitigates external interference.<n> Experimental results demonstrate that the proposed RGB-Speckle model offers significant advantages in cross-domain and cross-scene 3D reconstruction tasks.
- Score: 9.20903035677888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D reconstruction garners increasing attention alongside the advancement of high-level image applications, where dense stereo matching (DSM) serves as a pivotal technique. Previous studies often rely on publicly available datasets for training, focusing on modifying network architectures or incorporating specialized modules to extract domain-invariant features and thus improve model robustness. In contrast, inspired by single-frame structured-light phase-shifting encoding, this study introduces RGB-Speckle, a cross-scene 3D reconstruction framework based on an active stereo camera system, designed to enhance robustness. Specifically, we propose a novel phase pre-normalization encoding-decoding method: first, we randomly perturb phase-shift maps and embed them into the three RGB channels to generate color speckle patterns; subsequently, the camera captures phase-encoded images modulated by objects as input to a stereo matching network. This technique effectively mitigates external interference and ensures consistent input data for RGB-Speckle, thereby bolstering cross-domain 3D reconstruction stability. To validate the proposed method, we conduct complex experiments: (1) construct a color speckle dataset for complex scenarios based on the proposed encoding scheme; (2) evaluate the impact of the phase pre-normalization encoding-decoding technique on 3D reconstruction accuracy; and (3) further investigate its robustness across diverse conditions. Experimental results demonstrate that the proposed RGB-Speckle model offers significant advantages in cross-domain and cross-scene 3D reconstruction tasks, enhancing model generalization and reinforcing robustness in challenging environments, thus providing a novel solution for robust 3D reconstruction research.
Related papers
- Semantic Scene Completion with Multi-Feature Data Balancing Network [5.3431413737671525]
We propose a dual-head model for RGB and depth data (F-TSDF) inputs.<n>Our hybrid encoder-decoder architecture with identity transformation in a pre-activation residual module effectively manages diverse signals within F-TSDF.<n>We evaluate RGB feature fusion strategies and use a combined loss function cross entropy for 2D RGB features and weighted cross-entropy for 3D SSC predictions.
arXiv Detail & Related papers (2024-12-02T12:12:21Z) - T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.
We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z) - PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [3.61512056914095]
We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length.
PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images.
arXiv Detail & Related papers (2024-11-25T19:16:29Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.<n>Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.<n>Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D
Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - TransformerFusion: Monocular RGB Scene Reconstruction using Transformers [26.87200488085741]
TransformerFusion is a transformer-based 3D scene reconstruction approach.
Network learns to attend to the most relevant image frames for each 3D location in the scene.
Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed.
arXiv Detail & Related papers (2021-07-05T18:00:11Z) - RGB-D Salient Object Detection via 3D Convolutional Neural Networks [19.20231385522917]
We make the first attempt in addressing RGB-D SOD through 3D convolutional neural networks.
The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage.
We show that RD3D performs favorably against 14 state-of-the-art RGB-D SOD approaches in terms of four key evaluation metrics.
arXiv Detail & Related papers (2021-01-25T17:03:02Z) - SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans [34.397726189729994]
SPSG is a novel approach to generate high-quality, colored 3D models of scenes from RGB-D scan observations.
Our self-supervised approach learns to jointly inpaint geometry and color by correlating an incomplete RGB-D scan with a more complete version of that scan.
arXiv Detail & Related papers (2020-06-25T18:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.