ConVRT: Consistent Video Restoration Through Turbulence with Test-time
Optimization of Neural Video Representations
- URL: http://arxiv.org/abs/2312.04679v1
- Date: Thu, 7 Dec 2023 20:19:48 GMT
- Title: ConVRT: Consistent Video Restoration Through Turbulence with Test-time
Optimization of Neural Video Representations
- Authors: Haoming Cai, Jingxi Chen, Brandon Y. Feng, Weiyun Jiang, Mingyang Xie,
Kevin Zhang, Ashok Veeraraghavan, Christopher Metzler
- Abstract summary: We introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT)
ConVRT is a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration.
A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision.
- Score: 13.38405890753946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: tmospheric turbulence presents a significant challenge in long-range imaging.
Current restoration algorithms often struggle with temporal inconsistency, as
well as limited generalization ability across varying turbulence levels and
scene content different than the training data. To tackle these issues, we
introduce a self-supervised method, Consistent Video Restoration through
Turbulence (ConVRT) a test-time optimization method featuring a neural video
representation designed to enhance temporal consistency in restoration. A key
innovation of ConVRT is the integration of a pretrained vision-language model
(CLIP) for semantic-oriented supervision, which steers the restoration towards
sharp, photorealistic images in the CLIP latent space. We further develop a
principled selection strategy of text prompts, based on their statistical
correlation with a perceptual metric. ConVRT's test-time optimization allows it
to adapt to a wide range of real-world turbulence conditions, effectively
leveraging the insights gained from pre-trained models on simulated data.
ConVRT offers a comprehensive and effective solution for mitigating real-world
turbulence in dynamic videos.
Related papers
- Learning Truncated Causal History Model for Video Restoration [14.381907888022615]
TURTLE learns the truncated causal history model for efficient and high-performing video restoration.
We report new state-of-the-art results on a multitude of video restoration benchmark tasks.
arXiv Detail & Related papers (2024-10-04T21:31:02Z) - RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter [77.0205013713008]
Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries.
To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained vision models.
We propose a sparse-andcorrelated AdaPter (RAP) to fine-tune the pre-trained model with a few parameterized layers.
arXiv Detail & Related papers (2024-05-29T19:23:53Z) - Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition [68.6707284662443]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic and static scenes plagued by severe invisibility and noise.
One critical aspect is formulating a consistency constraint specifically for temporal-spatial illumination and appearance enhanced versions.
We present an innovative video Retinex-based decomposition strategy that operates without the need for explicit supervision.
arXiv Detail & Related papers (2024-05-24T15:56:40Z) - Spatio-Temporal Turbulence Mitigation: A Translational Perspective [13.978156774471744]
We present the Deep Atmospheric TUrbulence Mitigation network ( DATUM)
DATUM aims to overcome major challenges when transitioning from classical to deep learning approaches.
A large-scale training dataset, ATSyn, is presented as a co-invention to enable generalization in real turbulence.
arXiv Detail & Related papers (2024-01-08T21:35:05Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z) - Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video
Restoration [78.14941737723501]
We propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR.
By orchestrating two cascading procedures, CDUN achieves adaptive processing for diverse degradations.
In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames.
arXiv Detail & Related papers (2023-09-04T14:18:00Z) - Physics-Driven Turbulence Image Restoration with Stochastic Refinement [80.79900297089176]
Image distortion by atmospheric turbulence is a critical problem in long-range optical imaging systems.
Fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions.
This paper proposes the Physics-integrated Restoration Network (PiRN) to help the network to disentangle theity from the degradation and the underlying image.
arXiv Detail & Related papers (2023-07-20T05:49:21Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.