Related papers: Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements

Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements

URL: http://arxiv.org/abs/2312.07835v1
Date: Wed, 13 Dec 2023 01:57:11 GMT
Title: Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements
Authors: Gaurav Shrivastava, Ser-Nam Lim, Abhinav Shrivastava
Abstract summary: We present a framework for low-level vision tasks that does not require any external training data corpus. Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
Score: 83.5820690348833
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we present a novel robust framework for low-level vision tasks, including denoising, object removal, frame interpolation, and super-resolution, that does not require any external training data corpus. Our proposed approach directly learns the weights of neural modules by optimizing over the corrupted test sequence, leveraging the spatio-temporal coherence and internal statistics of videos. Furthermore, we introduce a novel spatial pyramid loss that leverages the property of spatio-temporal patch recurrence in a video across the different scales of the video. This loss enhances robustness to unstructured noise in both the spatial and temporal domains. This further results in our framework being highly robust to degradation in input frames and yields state-of-the-art results on downstream tasks such as denoising, object removal, and frame interpolation. To validate the effectiveness of our approach, we conduct qualitative and quantitative evaluations on standard video datasets such as DAVIS, UCF-101, and VIMEO90K-T.

Related papers

Motion-Aware Concept Alignment for Consistent Video Editing [57.08108545219043]
We introduce MoCA-Video (Motion-Aware Concept Alignment in Video), a training-free framework bridging the gap between image-domain semantic mixing and video.<n>Given a generated video and a user-provided reference image, MoCA-Video injects the semantic features of the reference image into a specific object within the video.<n>We evaluate MoCA's performance using the standard SSIM, image-level LPIPS, temporal LPIPS, and introduce a novel metric CASS (Conceptual Alignment Shift Score) to evaluate the consistency and effectiveness of the visual shifts between the source prompt and the modified video frames
arXiv Detail & Related papers (2025-06-01T13:28:04Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [63.6870051909004]
We propose a deep learning-based video matting framework which employs a novel aggregation feature module (STFAM) To eliminate frame-by-frame trimap annotations, a lightweight interactive trimap propagation network is also introduced. Our framework significantly outperforms conventional video matting and deep image matting methods.
arXiv Detail & Related papers (2021-04-22T17:42:08Z)
Frame-rate Up-conversion Detection Based on Convolutional Neural Network for Learning Spatiotemporal Features [7.895528973776606]
This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion. FCDNet uses a stack of consecutive frames as the input and effectively learns artifacts using network blocks to learn features.
arXiv Detail & Related papers (2021-03-25T08:47:46Z)
Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner. To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference [31.97227651679233]
Multi-object scene decomposition is a fast-emerging problem in learning. We show that our model has a high accuracy even without color information. We demonstrate the decomposition, segmentation prediction capabilities of our model and show that it outperforms the state-of-the-art on several benchmark datasets.
arXiv Detail & Related papers (2020-06-25T22:57:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.