Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
- URL: http://arxiv.org/abs/2302.01571v1
- Date: Fri, 3 Feb 2023 06:49:27 GMT
- Title: Robust Camera Pose Refinement for Multi-Resolution Hash Encoding
- Authors: Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo
J. Kim, Jin-Hwa Kim
- Abstract summary: Multi-resolution hash encoding has been proposed to reduce the computational cost of neural renderings.
We propose a joint optimization algorithm to calibrate the camera pose and learn a geometric representation using efficient multi-resolution hash encoding.
- Score: 19.655282240361835
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-resolution hash encoding has recently been proposed to reduce the
computational cost of neural renderings, such as NeRF. This method requires
accurate camera poses for the neural renderings of given scenes. However,
contrary to previous methods jointly optimizing camera poses and 3D scenes, the
naive gradient-based camera pose refinement method using multi-resolution hash
encoding severely deteriorates performance. We propose a joint optimization
algorithm to calibrate the camera pose and learn a geometric representation
using efficient multi-resolution hash encoding. Showing that the oscillating
gradient flows of hash encoding interfere with the registration of camera
poses, our method addresses the issue by utilizing smooth interpolation
weighting to stabilize the gradient oscillation for the ray samplings across
hash grids. Moreover, the curriculum training procedure helps to learn the
level-wise hash encoding, further increasing the pose refinement. Experiments
on the novel-view synthesis datasets validate that our learning frameworks
achieve state-of-the-art performance and rapid convergence of neural rendering,
even when initial camera poses are unknown.
Related papers
- A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose [44.13819148680788]
We develop a novel construct-and-optimize method for sparse view synthesis without camera poses.
Specifically, we construct a solution by using monocular depth and projecting pixels back into the 3D world.
We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views.
arXiv Detail & Related papers (2024-05-06T17:36:44Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams [44.02794438687478]
Spike cameras have proven effective in capturing motion features and beneficial for solving this ill-posed problem.
Existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios.
We propose the first self-supervised framework for the task of spike-guided motion deblurring.
arXiv Detail & Related papers (2024-03-14T15:29:09Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter
Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption.
We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels.
Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z) - Self-Supervised Camera Self-Calibration from Video [34.35533943247917]
We propose a learning algorithm to regress per-sequence calibration parameters using an efficient family of general camera models.
Our procedure achieves self-calibration results with sub-pixel reprojection error, outperforming other learning-based methods.
arXiv Detail & Related papers (2021-12-06T19:42:05Z) - Learning a Model-Driven Variational Network for Deformable Image
Registration [89.9830129923847]
VR-Net is a novel cascaded variational network for unsupervised deformable image registration.
It outperforms state-of-the-art deep learning methods on registration accuracy.
It maintains the fast inference speed of deep learning and the data-efficiency of variational model.
arXiv Detail & Related papers (2021-05-25T21:37:37Z) - Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and
Video Denoising [104.59305271099967]
We present a pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.
We develop a pixel aggregation network for video denoising to sample pixels across the spatial-temporal space.
Our method is able to solve the misalignment issues caused by large motion in dynamic scenes.
arXiv Detail & Related papers (2021-01-26T13:00:46Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - IMU-Assisted Learning of Single-View Rolling Shutter Correction [16.242924916178282]
Rolling shutter distortion is highly undesirable for photography and computer vision algorithms.
We propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction.
arXiv Detail & Related papers (2020-11-05T21:33:25Z) - u-net CNN based fourier ptychography [5.46367622374939]
We propose a new retrieval algorithm that is based on convolutional neural networks.
Experiments demonstrate that our model achieves better reconstruction results and is more robust under system aberrations.
arXiv Detail & Related papers (2020-03-16T22:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.