Related papers: ViGG: Robust RGB-D Point Cloud Registration using Visual-Geometric Mutual Guidance

ViGG: Robust RGB-D Point Cloud Registration using Visual-Geometric Mutual Guidance

URL: http://arxiv.org/abs/2511.22908v1
Date: Fri, 28 Nov 2025 06:27:37 GMT
Title: ViGG: Robust RGB-D Point Cloud Registration using Visual-Geometric Mutual Guidance
Authors: Congjia Chen, Shen Yan, Yufu Qu,
Abstract summary: ViGG is a robust RGB-D registration method using mutual guidance.<n>Experiments on 3DMatch, ScanNet and KITTI datasets show that our method outperforms recent state-of-the-art methods in both learning-free and learning-based settings.
Score: 18.052751061895215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Point cloud registration is a fundamental task in 3D vision. Most existing methods only use geometric information for registration. Recently proposed RGB-D registration methods primarily focus on feature fusion or improving feature learning, which limits their ability to exploit image information and hinders their practical applicability. In this paper, we propose ViGG, a robust RGB-D registration method using mutual guidance. First, we solve clique alignment in a visual-geometric combination form, employing a geometric guidance design to suppress ambiguous cliques. Second, to mitigate accuracy degradation caused by noise in visual matches, we propose a visual-guided geometric matching method that utilizes visual priors to determine the search space, enabling the extraction of high-quality, noise-insensitive correspondences. This mutual guidance strategy brings our method superior robustness, making it applicable for various RGB-D registration tasks. The experiments on 3DMatch, ScanNet and KITTI datasets show that our method outperforms recent state-of-the-art methods in both learning-free and learning-based settings. Code is available at https://github.com/ccjccjccj/ViGG.

Related papers

DINOReg: Strong Point Cloud Registration with Vision Foundation Model [0.0]
Point cloud registration is a fundamental task in 3D computer vision.<n>Recent studies have incorporated color information from RGB-D data into feature extraction.<n>We propose DINOReg, a registration network that sufficiently utilizes both visual and geometric information.
arXiv Detail & Related papers (2025-09-29T07:15:47Z)
Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images [56.134885746889026]
semantic scene graph estimation methods utilize ground truth 3D annotations to accurately predict target objects, predicates, and relationships.<n>We overcome the noisy reconstructed pseudo point-based geometry from predicted depth maps and reduce the amount of background noise present in multi-view image features.<n>Our method outperforms current methods purely using multi-view images as the initial input.
arXiv Detail & Related papers (2025-08-05T21:25:50Z)
LiOn-XA: Unsupervised Domain Adaptation via LiDAR-Only Cross-Modal Adversarial Training [61.26381389532653]
LiOn-XA is an unsupervised domain adaptation (UDA) approach that combines LiDAR-Only Cross-Modal (X) learning with Adversarial training for 3D LiDAR point cloud semantic segmentation. Our experiments on 3 real-to-real adaptation scenarios demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T09:50:17Z)
RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration [0.0]
We propose a new feature combination framework, which applies a looser but more effective combination. An explicit filter based on transformation consistency is designed for the combination framework, which can overcome each feature's weakness. Experiments on ScanNet and 3DMatch show that our method achieves a state-of-the-art performance.
arXiv Detail & Related papers (2024-05-13T09:56:28Z)
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images [57.71600854525037]
We propose a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images. MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects.
arXiv Detail & Related papers (2024-03-03T14:01:03Z)
PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration [6.030097207369754]
We propose a network implementing multi-scale bidirectional fusion between RGB images and point clouds generated from depth images. Our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-08-09T08:13:46Z)
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework. To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method. We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry [28.653015760036602]
We introduce a novel 3D point cloud registration module explicitly embedding the color signals into the geometry representation. Our key contribution is a 2D-3D cross-modality learning algorithm that embeds the deep features learned from color signals to the geometry representation. Our study reveals a significant advantages of correlating explicit deep color features to the point cloud in the registration task.
arXiv Detail & Related papers (2023-02-28T08:50:17Z)
Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation [38.64501645574878]
Point cloud registration aims at estimating the geometric transformation between two point cloud scans. Recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence. We propose a new Geometry-Aware Visual Feature Extractor (GAVE) that employs multi-scale local linear transformation.
arXiv Detail & Related papers (2022-08-31T14:36:09Z)
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.