Doppelgangers: Learning to Disambiguate Images of Similar Structures
- URL: http://arxiv.org/abs/2309.02420v1
- Date: Tue, 5 Sep 2023 17:50:36 GMT
- Title: Doppelgangers: Learning to Disambiguate Images of Similar Structures
- Authors: Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath
Hariharan, Noah Snavely
- Abstract summary: Illusory image matches can be challenging for humans to differentiate, and can lead 3D reconstruction algorithms to produce erroneous results.
We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs.
Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions.
- Score: 76.61267007774089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the visual disambiguation task of determining whether a pair of
visually similar images depict the same or distinct 3D surfaces (e.g., the same
or opposite sides of a symmetric building). Illusory image matches, where two
images observe distinct but visually similar 3D surfaces, can be challenging
for humans to differentiate, and can also lead 3D reconstruction algorithms to
produce erroneous results. We propose a learning-based approach to visual
disambiguation, formulating it as a binary classification task on image pairs.
To that end, we introduce a new dataset for this problem, Doppelgangers, which
includes image pairs of similar structures with ground truth labels. We also
design a network architecture that takes the spatial distribution of local
keypoints and matches as input, allowing for better reasoning about both local
and global cues. Our evaluation shows that our method can distinguish illusory
matches in difficult cases, and can be integrated into SfM pipelines to produce
correct, disambiguated 3D reconstructions. See our project page for our code,
datasets, and more results: http://doppelgangers-3d.github.io/.
Related papers
- 3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets [34.610546020800236]
3DMiner is a pipeline for mining 3D shapes from challenging datasets.
Our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques.
We show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset.
arXiv Detail & Related papers (2023-10-29T23:08:19Z) - Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for
Occluded Regions [14.217367037250296]
Occ$2$Net is an image matching method that models occlusion relations using 3D occupancy and infers matching points in occluded regions.
We evaluate our method on both real-world and simulated datasets and demonstrate its superior performance over state-of-the-art methods on several metrics.
arXiv Detail & Related papers (2023-08-14T13:09:41Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - Zero in on Shape: A Generic 2D-3D Instance Similarity Metric learned
from Synthetic Data [3.71630298053787]
We present a network architecture which compares RGB images and untextured 3D models by the similarity of the represented shape.
Our system is optimised for zero-shot retrieval, meaning it can recognise shapes never shown in training.
arXiv Detail & Related papers (2021-08-09T14:44:08Z) - Joint Deep Multi-Graph Matching and 3D Geometry Learning from
Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections.
We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z) - Bidirectional Projection Network for Cross Dimension Scene Understanding [69.29443390126805]
We present a emphbidirectional projection network (BPNet) for joint 2D and 3D reasoning in an end-to-end manner.
Via the emphBPM, complementary 2D and 3D information can interact with each other in multiple architectural levels.
Our emphBPNet achieves top performance on the ScanNetV2 benchmark for both 2D and 3D semantic segmentation.
arXiv Detail & Related papers (2021-03-26T08:31:39Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Predicting Visual Overlap of Images Through Interpretable Non-Metric Box
Embeddings [29.412748394892105]
We propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup.
We show how this embedding yields competitive image-matching results, while being simpler, faster, and also interpretable by humans.
arXiv Detail & Related papers (2020-08-13T10:01:07Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.