Predicting Visual Overlap of Images Through Interpretable Non-Metric Box
Embeddings
- URL: http://arxiv.org/abs/2008.05785v1
- Date: Thu, 13 Aug 2020 10:01:07 GMT
- Title: Predicting Visual Overlap of Images Through Interpretable Non-Metric Box
Embeddings
- Authors: Anita Rau, Guillermo Garcia-Hernando, Danail Stoyanov, Gabriel J.
Brostow, Daniyar Turmukhambetov
- Abstract summary: We propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup.
We show how this embedding yields competitive image-matching results, while being simpler, faster, and also interpretable by humans.
- Score: 29.412748394892105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To what extent are two images picturing the same 3D surfaces? Even when this
is a known scene, the answer typically requires an expensive search across
scale space, with matching and geometric verification of large sets of local
features. This expense is further multiplied when a query image is evaluated
against a gallery, e.g. in visual relocalization. While we don't obviate the
need for geometric verification, we propose an interpretable image-embedding
that cuts the search in scale space to essentially a lookup.
Our approach measures the asymmetric relation between two images. The model
then learns a scene-specific measure of similarity, from training examples with
known 3D visible-surface overlaps. The result is that we can quickly identify,
for example, which test image is a close-up version of another, and by what
scale factor. Subsequently, local features need only be detected at that scale.
We validate our scene-specific model by showing how this embedding yields
competitive image-matching results, while being simpler, faster, and also
interpretable by humans.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences [21.057940424318314]
Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences.
We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space.
arXiv Detail & Related papers (2024-04-09T14:22:50Z) - Doppelgangers: Learning to Disambiguate Images of Similar Structures [76.61267007774089]
Illusory image matches can be challenging for humans to differentiate, and can lead 3D reconstruction algorithms to produce erroneous results.
We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs.
Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions.
arXiv Detail & Related papers (2023-09-05T17:50:36Z) - Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for
Occluded Regions [14.217367037250296]
Occ$2$Net is an image matching method that models occlusion relations using 3D occupancy and infers matching points in occluded regions.
We evaluate our method on both real-world and simulated datasets and demonstrate its superior performance over state-of-the-art methods on several metrics.
arXiv Detail & Related papers (2023-08-14T13:09:41Z) - Explicit Correspondence Matching for Generalizable Neural Radiance
Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.
The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.
Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - 3D Object Detection and Pose Estimation of Unseen Objects in Color
Images with Local Surface Embeddings [35.769234123059086]
We present an approach for detecting and estimating the 3D poses of objects in images that requires only an untextured CAD model.
Our approach combines Deep Learning and 3D geometry: It relies on an embedding of local 3D geometry to match the CAD models to the input images.
We show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences.
arXiv Detail & Related papers (2020-10-08T15:57:06Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.