3D Multi-frame Fusion for Video Stabilization
- URL: http://arxiv.org/abs/2404.12887v1
- Date: Fri, 19 Apr 2024 13:43:14 GMT
- Title: 3D Multi-frame Fusion for Video Stabilization
- Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao,
- Abstract summary: We present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering.
The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space.
SR involves warping features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image.
In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process.
- Score: 32.42910053491574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, which extends beyond the image fusion by incorporating feature fusion. The core of our RStab framework lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space. Specifically, SR involves warping features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image. However, the precision of warped information depends on the projection accuracy, a factor significantly influenced by dynamic regions. In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process. Additionally, we propose Color Correction (CC) assisting geometric constraints with optical flow for accurate color aggregation. Thanks to the three modules, our RStab demonstrates superior performance compared with previous stabilizers in the field of view (FOV), image quality, and video stability across various datasets.
Related papers
- Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image.
We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics.
By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections [8.261637198675151]
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics.
We propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections.
Our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
arXiv Detail & Related papers (2024-06-04T15:17:37Z) - FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal
Consistent Transformer for 3D Object Detection [14.457844173630667]
We propose a novel end-to-end multi-modal fusion transformer-based framework, dubbed FusionFormer.
By developing a uniform sampling strategy, our method can easily sample from 2D image and 3D voxel features spontaneously.
Our method achieves state-of-the-art single model performance of 72.6% mAP and 75.1% NDS in the 3D object detection task without test time augmentation.
arXiv Detail & Related papers (2023-09-11T06:27:25Z) - Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter
Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption.
We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels.
Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z) - Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization [82.56853587380168]
Warping-based video stabilizers smooth camera trajectory by constraining each pixel's displacement and warp frames from unstable ones.
OVS can be integrated into existing warping-based stabilizers as a plug-and-play module to significantly improve the cropping ratio of the stabilized results.
arXiv Detail & Related papers (2021-08-20T08:07:47Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Neural Re-rendering for Full-frame Video Stabilization [144.9918806873405]
We present an algorithm for full-frame video stabilization by first estimating dense warp fields.
Full-frame stabilized frames can then be synthesized by fusing warped contents from neighboring frames.
arXiv Detail & Related papers (2021-02-11T18:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.