Related papers: TC-SfM: Robust Track-Community-Based Structure-from-Motion

TC-SfM: Robust Track-Community-Based Structure-from-Motion

URL: http://arxiv.org/abs/2206.05866v1
Date: Mon, 13 Jun 2022 01:09:12 GMT
Title: TC-SfM: Robust Track-Community-Based Structure-from-Motion
Authors: Lei Wang, Linlin Ge, Shan Luo, Zihan Yan, Zhaopeng Cui and Jieqing Feng
Abstract summary: We propose to exploit high-level information in the scene, i.e., the spatial contextual information of local regions, to guide the reconstruction. A novel structure is proposed, namely, textittrack-community, in which each community consists of a group of tracks and represents a local segment in the scene.
Score: 24.956499348500763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structure-from-Motion (SfM) aims to recover 3D scene structures and camera poses based on the correspondences between input images, and thus the ambiguity caused by duplicate structures (i.e., different structures with strong visual resemblance) always results in incorrect camera poses and 3D structures. To deal with the ambiguity, most existing studies resort to additional constraint information or implicit inference by analyzing two-view geometries or feature points. In this paper, we propose to exploit high-level information in the scene, i.e., the spatial contextual information of local regions, to guide the reconstruction. Specifically, a novel structure is proposed, namely, {\textit{track-community}}, in which each community consists of a group of tracks and represents a local segment in the scene. A community detection algorithm is used to partition the scene into several segments. Then, the potential ambiguous segments are detected by analyzing the neighborhood of tracks and corrected by checking the pose consistency. Finally, we perform partial reconstruction on each segment and align them with a novel bidirectional consistency cost function which considers both 3D-3D correspondences and pairwise relative camera poses. Experimental results demonstrate that our approach can robustly alleviate reconstruction failure resulting from visually indistinguishable structures and accurately merge the partial reconstructions.

Related papers

3D Reconstruction via Incremental Structure From Motion [1.4999444543328293]
We present a detailed implementation of the incremental SfM pipeline, focusing on the consistency of geometric estimation and the effect of iterative refinement through bundle adjustment.<n>Results support the practical utility of incremental SfM as a reliable method for sparse 3D reconstruction in visually structured environments.
arXiv Detail & Related papers (2025-08-01T18:45:05Z)
A Guide to Structureless Visual Localization [63.41481414949785]
Methods that estimate the camera pose of a query image in a known scene are core components of many applications, including self-driving cars and augmented / mixed reality systems. State-of-the-art visual localization algorithms are structure-based, i.e., they store a 3D model of the scene and use 2D-3D correspondences between the query image and 3D points in the model for camera pose estimation. This paper is dedicated to providing, to the best of our knowledge, first comprehensive discussion and comparison of structureless methods.
arXiv Detail & Related papers (2025-04-24T15:08:36Z)
Structure-Aware Correspondence Learning for Relative Pose Estimation [65.44234975976451]
Relative pose estimation provides a promising way for achieving object-agnostic pose estimation. Existing 3D correspondence-based methods suffer from small overlaps in visible regions and unreliable feature estimation for invisible regions. We propose a novel Structure-Aware Correspondence Learning method for Relative Pose Estimation, which consists of two key modules.
arXiv Detail & Related papers (2025-03-24T13:43:44Z)
Betsu-Betsu: Multi-View Separable 3D Reconstruction of Two Interacting Objects [67.96148051569993]
This paper introduces a new neuro-implicit method that can reconstruct the geometry and appearance of two objects undergoing close interactions while disjoining both in 3D. The framework is end-to-end trainable and supervised using a novel alpha-blending regularisation. We introduce a new dataset consisting of close interactions between a human and an object and also evaluate on two scenes of humans performing martial arts.
arXiv Detail & Related papers (2025-02-19T18:59:56Z)
Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data [0.0]
This paper presents a real-time pipeline for localizing building components, including wall and ground surfaces, by integrating geometric calculations for pure 3D plane detection. It has a parallel multi-thread architecture to precisely estimate poses and equations of all the planes detected in the environment, filters the ones forming the map structure using a panoptic segmentation validation, and keeps only the validated building components. It can also ensure (re-)association of these detected components into a unified 3D scene graph, bridging the gap between geometric accuracy and semantic understanding.
arXiv Detail & Related papers (2024-09-10T16:28:09Z)
Visual Geometry Grounded Deep Structure From Motion [20.203320509695306]
We propose a new deep pipeline VGGSfM, where each component is fully differentiable and can be trained in an end-to-end manner. First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches. We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.
arXiv Detail & Related papers (2023-12-07T18:59:52Z)
Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories. The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation. Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z)
Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images. This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories. We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
Reconstructing Small 3D Objects in front of a Textured Background [0.0]
We present a technique for a complete 3D reconstruction of small objects moving in front of a textured background. It is a particular variation of multibody structure from motion, which specializes to two objects only. In experiments with real artifacts, we show that our approach has practical advantages when reconstructing 3D objects from all sides.
arXiv Detail & Related papers (2021-05-24T15:36:33Z)
Deep Permutation Equivariant Structure from Motion [38.68492294795315]
Existing deep methods produce highly accurate 3D reconstructions in stereo and multiview stereo settings. We propose a neural network architecture that recovers both the camera parameters and a scene structure by minimizing an unsupervised reprojection loss. Our experiments, conducted on a variety of datasets in both internally calibrated and uncalibrated settings, indicate that our method accurately recovers pose and structure, on par with classical state of the art methods.
arXiv Detail & Related papers (2021-04-14T08:50:06Z)
Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video [90.93141123721713]
Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera.
arXiv Detail & Related papers (2020-05-07T10:39:20Z)
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object. The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
Unsupervised Learning of Intrinsic Structural Representation Points [50.92621061405056]
Learning structures of 3D shapes is a fundamental problem in the field of computer graphics and geometry processing. We present a simple yet interpretable unsupervised method for learning a new structural representation in the form of 3D structure points.
arXiv Detail & Related papers (2020-03-03T17:40:00Z)
Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image [24.99186733297264]
We propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components. Experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T16:00:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.