Beyond Wide-Angle Images: Unsupervised Video Portrait Correction via Spatiotemporal Diffusion Adaptation
- URL: http://arxiv.org/abs/2504.00401v1
- Date: Tue, 01 Apr 2025 03:49:59 GMT
- Title: Beyond Wide-Angle Images: Unsupervised Video Portrait Correction via Spatiotemporal Diffusion Adaptation
- Authors: Wenbo Nie, Lang Nie, Chunyu Lin, Jingwen Chen, Ke Xing, Jiyuan Wang, Yao Zhao,
- Abstract summary: We propose an image portrait correction framework using diffusion models named ImagePD.<n>It integrates the long-range awareness of transformer and multi-step denoising of diffusion models into a unified framework.<n> Experiments demonstrate that the proposed methods outperform existing solutions quantitatively and qualitatively.
- Score: 46.16087086554505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wide-angle cameras, despite their popularity for content creation, suffer from distortion-induced facial stretching-especially at the edge of the lens-which degrades visual appeal. To address this issue, we propose an image portrait correction framework using diffusion models named ImagePD. It integrates the long-range awareness of transformer and multi-step denoising of diffusion models into a unified framework, achieving global structural robustness and local detail refinement. Besides, considering the high cost of obtaining video labels, we then repurpose ImagePD for unlabeled wide-angle videos (termed VideoPD), by spatiotemporal diffusion adaption with spatial consistency and temporal smoothness constraints. For the former, we encourage the denoised image to approximate pseudo labels following the wide-angle distortion distribution pattern, while for the latter, we derive rectification trajectories with backward optical flows and smooth them. Compared with ImagePD, VideoPD maintains high-quality facial corrections in space and mitigates the potential temporal shakes sequentially. Finally, to establish an evaluation benchmark and train the framework, we establish a video portrait dataset with a large diversity in people number, lighting conditions, and background. Experiments demonstrate that the proposed methods outperform existing solutions quantitatively and qualitatively, contributing to high-fidelity wide-angle videos with stable and natural portraits. The codes and dataset will be available.
Related papers
- VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models [21.584843961386888]
VipDiff is a framework for conditioning diffusion model on the reverse diffusion process to produce temporal-coherent inpainting results.<n>It can largely outperform state-of-the-art video inpainting methods in terms of both spatial-temporal coherence and fidelity.
arXiv Detail & Related papers (2025-01-21T16:39:09Z) - DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures.<n>We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z) - VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models [58.464465016269614]
We propose a novel framework for solving high-definition video inverse problems using latent image diffusion models.<n>Our approach delivers HD-resolution reconstructions in under 6 seconds per frame on a single NVIDIA 4090 GPU.
arXiv Detail & Related papers (2024-11-29T08:10:49Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Correcting Face Distortion in Wide-Angle Videos [85.88898349347149]
We present a video warping algorithm to correct these distortions.
Our key idea is to apply stereographic projection locally on the facial regions.
For performance evaluation, we develop a wide-angle video dataset with a wide range of focal lengths.
arXiv Detail & Related papers (2021-11-18T21:28:17Z) - Practical Wide-Angle Portraits Correction with Deep Structured Models [17.62752136436382]
This paper introduces the first deep learning based approach to remove perspective distortions from photos.
Given a wide-angle portrait as input, we build a cascaded network consisting of a LineNet, a ShapeNet, and a transition module.
For the quantitative evaluation, we introduce two novel metrics, line consistency and face congruence.
arXiv Detail & Related papers (2021-04-26T10:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.