Related papers: SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

URL: http://arxiv.org/abs/2501.01320v3
Date: Tue, 04 Feb 2025 18:29:36 GMT
Title: SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Authors: Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Fei Xiao, Chen Change Loy, Lu Jiang,
Abstract summary: SeedVR is a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution.<n>It achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos.
Score: 73.70209718408641
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution. The core design of SeedVR lies in the shifted window attention that facilitates effective restoration on long video sequences. SeedVR further supports variable-sized windows near the boundary of both spatial and temporal dimensions, overcoming the resolution constraints of traditional window attention. Equipped with contemporary practices, including causal video autoencoder, mixed image and video training, and progressive training, SeedVR achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos. Extensive experiments demonstrate SeedVR's superiority over existing methods for generic video restoration.

Related papers

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos. These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z)
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones. Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency. We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z)
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models [17.29580459404157]
We propose LeanVAE, a novel and ultra-efficient Video VAE framework. Our model offers up to 50x fewer FLOPs and 44x faster inference speed. Our experiments validate LeanVAE's superiority in video reconstruction and generation.
arXiv Detail & Related papers (2025-03-18T14:58:59Z)
Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution [25.615935776826596]
Video super-resolution (SR) is proposed to enhance resolution, but practical ODV spatial projection distortions and temporal flickering are not well addressed directly applying existing methods. We propose a Spatio-Temporal Distortion Aware Network (STDAN) oriented to ODV characteristics to achieve better ODV-SR reconstruction.
arXiv Detail & Related papers (2024-10-15T11:17:19Z)
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models [9.145545884814327]
This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. We show that our method achieves top performance in zero-shot video restoration. Our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining.
arXiv Detail & Related papers (2024-07-01T17:59:12Z)
ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations [13.38405890753946]
We introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT) ConVRT is a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration. A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision.
arXiv Detail & Related papers (2023-12-07T20:19:48Z)
Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video Restoration [78.14941737723501]
We propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR. By orchestrating two cascading procedures, CDUN achieves adaptive processing for diverse degradations. In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames.
arXiv Detail & Related papers (2023-09-04T14:18:00Z)
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z)
VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z)
Evaluating Foveated Video Quality Using Entropic Differencing [1.5877673959068452]
We propose a full reference (FR) foveated image quality assessment algorithm, which employs the natural scene statistics of bandpass responses. We evaluate the proposed algorithm by measuring the correlations of the predictions that FED makes against human judgements. The performance of the proposed algorithm yields state-of-the-art as compared with other existing full reference algorithms.
arXiv Detail & Related papers (2021-06-12T16:29:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.