DIME-Net: Neural Network-Based Dynamic Intrinsic Parameter Rectification
for Cameras with Optical Image Stabilization System
- URL: http://arxiv.org/abs/2303.11307v1
- Date: Mon, 20 Mar 2023 17:45:12 GMT
- Title: DIME-Net: Neural Network-Based Dynamic Intrinsic Parameter Rectification
for Cameras with Optical Image Stabilization System
- Authors: Shu-Hao Yeh, Shuangyu Xie, Di Wang, Wei Yan, and Dezhen Song
- Abstract summary: We propose a novel neural network-based approach that estimates pose estimation or 3D reconstruction in real-time.
We name the proposed Dynamic Intrinsic pose estimation network as DIME-Net and have it implemented and tested on three different mobile devices.
In all cases, DIME-Net can reduce reprojection error by at least $64$% indicating that our design is successful.
- Score: 16.390775530663618
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Optical Image Stabilization (OIS) system in mobile devices reduces image
blurring by steering lens to compensate for hand jitters. However, OIS changes
intrinsic camera parameters (i.e. $\mathrm{K}$ matrix) dynamically which
hinders accurate camera pose estimation or 3D reconstruction. Here we propose a
novel neural network-based approach that estimates $\mathrm{K}$ matrix in
real-time so that pose estimation or scene reconstruction can be run at camera
native resolution for the highest accuracy on mobile devices. Our network
design takes gratified projection model discrepancy feature and 3D point
positions as inputs and employs a Multi-Layer Perceptron (MLP) to approximate
$f_{\mathrm{K}}$ manifold. We also design a unique training scheme for this
network by introducing a Back propagated PnP (BPnP) layer so that reprojection
error can be adopted as the loss function. The training process utilizes
precise calibration patterns for capturing accurate $f_{\mathrm{K}}$ manifold
but the trained network can be used anywhere. We name the proposed Dynamic
Intrinsic Manifold Estimation network as DIME-Net and have it implemented and
tested on three different mobile devices. In all cases, DIME-Net can reduce
reprojection error by at least $64\%$ indicating that our design is successful.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - Camera Calibration through Geometric Constraints from Rotation and
Projection Matrices [4.100632594106989]
We propose a novel constraints-based loss for measuring the intrinsic and extrinsic parameters of a camera.
Our methodology is a hybrid approach that employs the learning power of a neural network to estimate the desired parameters.
Our proposed approach demonstrates improvements across all parameters when compared to the state-of-the-art (SOTA) benchmarks.
arXiv Detail & Related papers (2024-02-13T13:07:34Z) - DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using
Single-View Depth and Gradient Predictions [22.243043857097582]
DeepFusion is capable of producing real-time dense reconstructions on a GPU.
It fuses the output of a semi-dense multiview stereo algorithm with the depth and predictions of a CNN in a probabilistic fashion.
Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
arXiv Detail & Related papers (2022-07-25T14:55:26Z) - Camera Calibration through Camera Projection Loss [4.36572039512405]
We propose a novel method to predict intrinsic (focal length and principal point offset) parameters using an image pair.
Unlike existing methods, we proposed a new representation that incorporates camera model equations as a neural network in multi-task learning framework.
Our proposed approach achieves better performance with respect to both deep learning-based and traditional methods on 7 out of 10 parameters evaluated.
arXiv Detail & Related papers (2021-10-07T14:03:10Z) - Unsupervised Depth Completion with Calibrated Backprojection Layers [79.35651668390496]
We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud.
It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera.
At inference time, the calibration of the camera, which can be different from the one used for training, is fed as an input to the network along with the sparse point cloud and a single image.
arXiv Detail & Related papers (2021-08-24T05:41:59Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective
Crop Layers [111.55817466296402]
We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry.
PCLs deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network.
PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry aware.
arXiv Detail & Related papers (2020-11-27T08:48:43Z) - Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance
Disparity Estimation [51.17232267143098]
We propose a novel system named Disp R-CNN for 3D object detection from stereo images.
We use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds.
Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
arXiv Detail & Related papers (2020-04-07T17:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.