Dual-Camera Smooth Zoom on Mobile Phones
- URL: http://arxiv.org/abs/2404.04908v2
- Date: Thu, 15 Aug 2024 12:32:04 GMT
- Title: Dual-Camera Smooth Zoom on Mobile Phones
- Authors: Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo,
- Abstract summary: We introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview.
The frame models (FI) technique is a potential solution but struggles with ground-truth collection.
We suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene.
- Score: 55.4114152554769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. The datasets, codes, and pre-trained models will are available at https://github.com/ZcsrenlongZ/ZoomGS.
Related papers
- Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration [34.18403601269181]
DM-Calib is a diffusion-based approach for estimating pinhole camera intrinsic parameters from a single input image.
We introduce a new image-based representation, termed Camera Image, which losslessly encodes the numerical camera intrinsics.
By fine-tuning a stable diffusion model to generate a Camera Image from a single RGB input, we can extract camera intrinsics via a RANSAC operation.
arXiv Detail & Related papers (2024-11-26T09:04:37Z) - GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring [50.72230109855628]
We propose GS-Blur, a dataset of synthesized realistic blurry images created using a novel approach.
We first reconstruct 3D scenes from multi-view images using 3D Gaussian Splatting (3DGS), then render blurry images by moving the camera view along the randomly generated motion trajectories.
By adopting various camera trajectories in reconstructing our GS-Blur, our dataset contains realistic and diverse types of blur, offering a large-scale dataset that generalizes well to real-world blur.
arXiv Detail & Related papers (2024-10-31T06:17:16Z) - Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models [3.9373541926236766]
We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data.
We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, or from sparse input views.
arXiv Detail & Related papers (2024-06-18T23:14:29Z) - Learning to Zoom and Unzoom [49.587516562644836]
We "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations.
We demonstrate this versatility by evaluating on a variety of tasks and datasets.
arXiv Detail & Related papers (2023-03-27T17:03:30Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.