CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos
- URL: http://arxiv.org/abs/2512.12060v1
- Date: Fri, 12 Dec 2025 22:03:14 GMT
- Title: CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos
- Authors: Tejas Panambur, Ishan Rajendrakumar Dave, Chongjian Ge, Ersin Yumer, Xue Bai,
- Abstract summary: CreativeVR is a diffusion-prior-guided video restoration framework for AI-generated (AIGC) and real videos with severe structural and temporal artifacts.<n>Our deep-adapter-based method exposes a single precision knob that controls how strongly the model follows the input.<n>CreativeVR achieves state-of-the-art results on videos with severe artifacts and performs competitively on standard video restoration benchmarks.
- Score: 17.81372151946937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern text-to-video (T2V) diffusion models can synthesize visually compelling clips, yet they remain brittle at fine-scale structure: even state-of-the-art generators often produce distorted faces and hands, warped backgrounds, and temporally inconsistent motion. Such severe structural artifacts also appear in very low-quality real-world videos. Classical video restoration and super-resolution (VR/VSR) methods, in contrast, are tuned for synthetic degradations such as blur and downsampling and tend to stabilize these artifacts rather than repair them, while diffusion-prior restorers are usually trained on photometric noise and offer little control over the trade-off between perceptual quality and fidelity. We introduce CreativeVR, a diffusion-prior-guided video restoration framework for AI-generated (AIGC) and real videos with severe structural and temporal artifacts. Our deep-adapter-based method exposes a single precision knob that controls how strongly the model follows the input, smoothly trading off between precise restoration on standard degradations and stronger structure- and motion-corrective behavior on challenging content. Our key novelty is a temporally coherent degradation module used during training, which applies carefully designed transformations that produce realistic structural failures. To evaluate AIGC-artifact restoration, we propose the AIGC54 benchmark with FIQA, semantic and perceptual metrics, and multi-aspect scoring. CreativeVR achieves state-of-the-art results on videos with severe artifacts and performs competitively on standard video restoration benchmarks, while running at practical throughput (about 13 FPS at 720p on a single 80-GB A100). Project page: https://daveishan.github.io/creativevr-webpage/.
Related papers
- All-in-One Video Restoration under Smoothly Evolving Unknown Weather Degradations [102.94052335735326]
All-in-one image restoration aims to recover clean images from diverse unknown degradations using a single model.<n>Existing approaches primarily focus on frame-wise degradation variation, overlooking the temporal continuity that naturally exists in real-world degradation processes.<n>We introduce the Smoothly Evolving Unknown Degradations (SEUD) scenario, where both the active degradation set and degradation intensity change continuously over time.
arXiv Detail & Related papers (2026-01-02T02:20:57Z) - STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution [60.06664986365803]
We present STCDiT, a video super-resolution framework built upon a pre-trained video diffusion model.<n>It aims to restore structurally faithful and temporally stable videos from degraded inputs, even under complex camera motions.
arXiv Detail & Related papers (2025-11-24T05:37:23Z) - MoA-VR: A Mixture-of-Agents System Towards All-in-One Video Restoration [62.929029990341796]
Real-world videos often suffer from complex degradations, such as noise, compression artifacts, and low-light distortions.<n>We propose MoA-VR, which mimics the reasoning and processing procedures of human professionals through three coordinated agents.<n>Specifically, we construct a large-scale and high-resolution video degradation recognition benchmark and build a vision-language model (VLM) driven degradation identifier.
arXiv Detail & Related papers (2025-10-09T17:42:51Z) - LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration [3.2944592608677614]
We propose LVTINO, the first zero-shot or plug-and-play inverse solver for high definition video restoration with priors encoded by VCMs.<n>Our conditioning mechanism bypasses the need for automatic differentiation and achieves state-of-the-art video reconstruction quality with only a few neural function evaluations.
arXiv Detail & Related papers (2025-10-01T18:10:08Z) - BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos [63.03271511550633]
BrokenVideos is a benchmark dataset of 3,254 AI-generated videos with meticulously annotated, pixel-level masks highlighting regions of visual corruption.<n>Our experiments show that training state of the art artifact detection models and multi modal large language models (MLLMs) on BrokenVideos significantly improves their ability to localize corrupted regions.
arXiv Detail & Related papers (2025-06-25T03:30:04Z) - Implicit Neural Representation for Video Restoration [4.960738913876514]
We introduce VR-INR, a novel video restoration approach based on Implicit Neural Representations (INRs)<n>VR-INR generalizes effectively to arbitrary, unseen super-resolution scales at test time.<n>It consistently maintains high-quality reconstructions at unseen scales and noise during training.
arXiv Detail & Related papers (2025-06-05T18:09:59Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration [73.70209718408641]
SeedVR is a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution.<n>It achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos.
arXiv Detail & Related papers (2025-01-02T16:19:48Z) - FLAIR: A Conditional Diffusion Framework with Applications to Face Video
Restoration [14.17192434286707]
We present a new conditional diffusion framework called FLAIR for face video restoration.
FLAIR ensures temporal consistency across frames in a computationally efficient fashion.
Our experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame on two high-quality face video datasets.
arXiv Detail & Related papers (2023-11-26T22:09:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.