Related papers: C-DOG: Multi-View Multi-instance Feature Association Using Connected δ-Overlap Graphs

C-DOG: Multi-View Multi-instance Feature Association Using Connected δ-Overlap Graphs

URL: http://arxiv.org/abs/2507.14095v2
Date: Fri, 01 Aug 2025 02:03:20 GMT
Title: C-DOG: Multi-View Multi-instance Feature Association Using Connected δ-Overlap Graphs
Authors: Yung-Hong Sun, Ting-Hung Lin, Jiangang Chen, Hongrui Jiang, Yu Hen Hu,
Abstract summary: We introduce C-DOG (Connected delta-Overlap Graph), an algorithm for robust geometrical feature association.<n>In a C-DOG graph, two nodes representing 2D feature points from distinct views are connected by an edge if they correspond to the same 3D point.<n>Experiments on synthetic benchmarks demonstrate that C-DOG not only outperforms geometry-based baseline algorithms but also remains remarkably robust under demanding conditions.
Score: 4.576442835703357
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-view multi-instance feature association constitutes a crucial step in 3D reconstruction, facilitating the consistent grouping of object instances across various camera perspectives. The presence of multiple identical objects within a scene often leads to ambiguities for appearance-based feature matching algorithms. Our work circumvents this challenge by exclusively employing geometrical constraints, specifically epipolar geometry, for feature association. We introduce C-DOG (Connected delta-Overlap Graph), an algorithm designed for robust geometrical feature association, even in the presence of noisy feature detections. In a C-DOG graph, two nodes representing 2D feature points from distinct views are connected by an edge if they correspond to the same 3D point. Each edge is weighted by its epipolar distance. Ideally, true associations yield a zero distance; however, noisy feature detections can result in non-zero values. To robustly retain edges where the epipolar distance is less than a threshold delta, we employ a Szymkiewicz--Simpson coefficient. This process leads to a delta-neighbor-overlap clustering of 2D nodes. Furthermore, unreliable nodes are pruned from these clusters using an Inter-quartile Range (IQR)-based criterion. Our extensive experiments on synthetic benchmarks demonstrate that C-DOG not only outperforms geometry-based baseline algorithms but also remains remarkably robust under demanding conditions. This includes scenes with high object density, no visual features, and restricted camera overlap, positioning C-DOG as an excellent solution for scalable 3D reconstruction in practical applications.

Related papers

Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z)
Occupancy-Based Dual Contouring [12.944046673902415]
We introduce a dual contouring method that provides state-of-the-art performance for occupancy functions. Our method is learning-free and carefully designed to maximize the use of GPU parallelization.
arXiv Detail & Related papers (2024-09-20T11:32:21Z)
Object Gaussian for Monocular 6D Pose Estimation from Sparse Views [4.290993205307184]
We introduce SGPose, a novel framework for sparse view object pose estimation using Gaussian-based methods. Given as few as ten views, SGPose generates a geometric-aware representation by starting with a random cuboid. Experiments on typical benchmarks, especially on the Occlusion LM-O dataset, demonstrate that SGPose outperforms existing methods even under sparse view constraints.
arXiv Detail & Related papers (2024-09-04T10:03:11Z)
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene. Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output. We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z)
COMO: Compact Mapping and Odometry [17.71754144808295]
We present COMO, a real-time monocular mapping and odometry system that encodes dense geometry via a compact set of 3D anchor points. The representation enables joint optimization of camera poses and dense geometry, intrinsic 3D consistency, and efficient second-order inference.
arXiv Detail & Related papers (2024-04-04T15:35:43Z)
Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions [14.217367037250296]
Occ$2$Net is an image matching method that models occlusion relations using 3D occupancy and infers matching points in occluded regions. We evaluate our method on both real-world and simulated datasets and demonstrate its superior performance over state-of-the-art methods on several metrics.
arXiv Detail & Related papers (2023-08-14T13:09:41Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching [39.461400537109895]
Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest. We introduce DGC-GNN, a novel algorithm that exploits geometric and color cues to represent keypoints, thereby improving matching accuracy. We evaluate DGC-GNN on both indoor and outdoor datasets, demonstrating that it not only doubles the accuracy of the state-of-the-art visual descriptor-free algorithm but also substantially narrows the performance gap between descriptor-based and descriptor-free methods.
arXiv Detail & Related papers (2023-06-21T20:21:15Z)
CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions. We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z)
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection [35.85544715234846]
We propose a dynamic sparse graph pipeline named Explicit3D based on object geometry and semantics features. Our experimental results on the SUN RGB-D dataset demonstrate that our Explicit3D achieves better performance balance than the-state-of-the-art.
arXiv Detail & Related papers (2023-02-13T16:19:54Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network. It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes. It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z)
PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z)
Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels. We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z)
Learning Geometry-Disentangled Representation for Complementary Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing. GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components. Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z)
Dense Non-Rigid Structure from Motion: A Manifold Viewpoint [162.88686222340962]
Non-Rigid Structure-from-Motion (NRSfM) problem aims to recover 3D geometry of a deforming object from its 2D feature correspondences across multiple frames. We show that our approach significantly improves accuracy, scalability, and robustness against noise.
arXiv Detail & Related papers (2020-06-15T09:15:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.