On Localizing a Camera from a Single Image
- URL: http://arxiv.org/abs/2003.10664v1
- Date: Tue, 24 Mar 2020 05:09:01 GMT
- Title: On Localizing a Camera from a Single Image
- Authors: Pradipta Ghosh, Xiaochen Liu, Hang Qiu, Marcos A. M. Vieira, Gaurav S.
Sukhatme, and Ramesh Govindan
- Abstract summary: We show that it is possible to estimate the location of a camera from a single image taken by the camera.
We show that, using a judicious combination of projective geometry, neural networks, and crowd-sourced annotations from human workers, it is possible to position 95% of the images in our test data set to within 12 m.
- Score: 9.049593493956008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Public cameras often have limited metadata describing their attributes. A key
missing attribute is the precise location of the camera, using which it is
possible to precisely pinpoint the location of events seen in the camera. In
this paper, we explore the following question: under what conditions is it
possible to estimate the location of a camera from a single image taken by the
camera? We show that, using a judicious combination of projective geometry,
neural networks, and crowd-sourced annotations from human workers, it is
possible to position 95% of the images in our test data set to within 12 m.
This performance is two orders of magnitude better than PoseNet, a
state-of-the-art neural network that, when trained on a large corpus of images
in an area, can estimate the pose of a single image. Finally, we show that the
camera's inferred position and intrinsic parameters can help design a number of
virtual sensors, all of which are reasonably accurate.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering [91.76893697171117]
We propose a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human.
Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos.
We collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs.
arXiv Detail & Related papers (2024-03-27T17:59:54Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Camera Pose Auto-Encoders for Improving Pose Regression [6.700873164609009]
We introduce Camera Pose Auto-Encoders (PAEs) to encode camera poses using APRs as their teachers.
We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks.
We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost.
arXiv Detail & Related papers (2022-07-12T13:47:36Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Can poachers find animals from public camera trap images? [14.61316451496861]
We investigate the robustness of geo-obfuscation for maintaining camera trap location privacy.
Simple intuitives and publicly available satellites can be used to reduce the area likely to contain the camera by 87%.
arXiv Detail & Related papers (2021-06-21T16:31:47Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Multi-camera Torso Pose Estimation using Graph Neural Networks [3.7431113857875746]
Estimating the location and orientation of humans is an essential skill for service and assistive robots.
The proposal presented in this paper makes use of graph neural networks to merge the information acquired from multiple camera sources.
The experiments, conducted in an apartment with three cameras, benchmarked two different graph neural network implementations and a third architecture.
arXiv Detail & Related papers (2020-07-28T11:14:02Z) - Neural Geometric Parser for Single Image Camera Calibration [17.393543270903653]
We propose a neural geometric learning single image camera calibration for man-made scenes.
Our approach considers both semantic and geometric cues, resulting in significant accuracy improvement.
The experimental results reveal that the performance of our neural approach is significantly higher than that of existing state-of-the-art camera calibration techniques.
arXiv Detail & Related papers (2020-07-23T08:29:00Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.