Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
- URL: http://arxiv.org/abs/2303.17166v2
- Date: Sat, 1 Jun 2024 22:06:36 GMT
- Title: Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
- Authors: Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii, Takayoshi Yamashita,
- Abstract summary: A Manhattan world lying along cuboid buildings is useful for camera angle estimation.
We propose a learning-based calibration method that uses heatmap regression to detect the directions of labeled image coordinates.
Our method outperforms conventional methods on large-scale datasets and with off-the-shelf cameras.
- Score: 9.018416031676136
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A Manhattan world lying along cuboid buildings is useful for camera angle estimation. However, accurate and robust angle estimation from fisheye images in the Manhattan world has remained an open challenge because general scene images tend to lack constraints such as lines, arcs, and vanishing points. To achieve higher accuracy and robustness, we propose a learning-based calibration method that uses heatmap regression, which is similar to pose estimation using keypoints, to detect the directions of labeled image coordinates. Simultaneously, our two estimators recover the rotation and remove fisheye distortion by remapping from a general scene image. Without considering vanishing-point constraints, we find that additional points for learning-based methods can be defined. To compensate for the lack of vanishing points in images, we introduce auxiliary diagonal points that have the optimal 3D arrangement of spatial uniformity. Extensive experiments demonstrated that our method outperforms conventional methods on large-scale datasets and with off-the-shelf cameras.
Related papers
- FaVoR: Features via Voxel Rendering for Camera Relocalization [23.7893950095252]
Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image.
We propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features.
By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking.
arXiv Detail & Related papers (2024-09-11T18:58:16Z) - RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot
Images [47.14713579719103]
We introduce a dense depth map as a geometry guide to mitigate overfitting.
The adjusted depth aids in the color-based optimization of 3D Gaussian splatting.
We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images.
arXiv Detail & Related papers (2023-11-22T13:53:04Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Rethinking Generic Camera Models for Deep Single Image Camera
Calibration to Recover Rotation and Fisheye Distortion [8.877834897951578]
We propose a generic camera model that has the potential to address various types of distortion.
Our proposed method outperformed conventional methods on two largescale datasets and images captured by off-the-shelf fisheye cameras.
arXiv Detail & Related papers (2021-11-25T05:58:23Z) - PICCOLO: Point Cloud-Centric Omnidirectional Localization [20.567452635590943]
We present PICCOLO, a simple and efficient algorithm for omnidirectional localization.
Our pipeline works in an off-the-shelf manner with a single image given as a query.
PICCOLO outperforms existing omnidirectional localization algorithms in both accuracy and stability when evaluated in various environments.
arXiv Detail & Related papers (2021-08-14T14:19:37Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.