PixSelect: Less but Reliable Pixels for Accurate and Efficient
Localization
- URL: http://arxiv.org/abs/2206.03775v1
- Date: Wed, 8 Jun 2022 09:46:03 GMT
- Title: PixSelect: Less but Reliable Pixels for Accurate and Efficient
Localization
- Authors: Mohammad Altillawi
- Abstract summary: We address the problem of estimating the global 6 DoF camera pose from a single RGB image in a given environment.
Our work exceeds state-ofthe-art methods on outdoor Cambridge Landmarks dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurate camera pose estimation is a fundamental requirement for numerous
applications, such as autonomous driving, mobile robotics, and augmented
reality. In this work, we address the problem of estimating the global 6 DoF
camera pose from a single RGB image in a given environment. Previous works
consider every part of the image valuable for localization. However, many image
regions such as the sky, occlusions, and repetitive non-distinguishable
patterns cannot be utilized for localization. In addition to adding unnecessary
computation efforts, extracting and matching features from such regions produce
many wrong matches which in turn degrades the localization accuracy and
efficiency. Our work addresses this particular issue and shows by exploiting an
interesting concept of sparse 3D models that we can exploit discriminatory
environment parts and avoid useless image regions for the sake of a single
image localization. Interestingly, through avoiding selecting keypoints from
non-reliable image regions such as trees, bushes, cars, pedestrians, and
occlusions, our work acts naturally as an outlier filter. This makes our system
highly efficient in that minimal set of correspondences is needed and highly
accurate as the number of outliers is low. Our work exceeds state-ofthe-art
methods on outdoor Cambridge Landmarks dataset. With only relying on single
image at inference, it outweighs in terms of accuracy methods that exploit pose
priors and/or reference 3D models while being much faster. By choosing as
little as 100 correspondences, it surpasses similar methods that localize from
thousands of correspondences, while being more efficient. In particular, it
achieves, compared to these methods, an improvement of localization by 33% on
OldHospital scene. Furthermore, It outstands direct pose regressors even those
that learn from sequence of images
Related papers
- FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z) - CFL-Net: Image Forgery Localization Using Contrastive Learning [16.668334854459143]
We use contrastive loss to learn mapping into a feature space where the features between untampered and manipulated regions are well-separated for each image.
Our method has the advantage of localizing manipulated region without requiring any prior knowledge or assumption about the forgery type.
arXiv Detail & Related papers (2022-10-04T15:31:30Z) - Visual Localization via Few-Shot Scene Region Classification [84.34083435501094]
Visual (re)localization addresses the problem of estimating the 6-DoF camera pose of a query image captured in a known scene.
Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates.
We propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images.
arXiv Detail & Related papers (2022-08-14T22:39:02Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images [85.81073893916414]
We develop a novel method for learning more accurate and better generalizing localization features.
On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.
arXiv Detail & Related papers (2020-08-27T14:46:22Z) - Multi-View Optimization of Local Feature Geometry [70.18863787469805]
We address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
Our proposed method naturally complements the traditional feature extraction and matching paradigm.
We show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.
arXiv Detail & Related papers (2020-03-18T17:22:11Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.