An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement
- URL: http://arxiv.org/abs/2012.13033v1
- Date: Thu, 24 Dec 2020 00:03:29 GMT
- Title: An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement
- Authors: Dario Fuoli, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool, Radu
Timofte
- Abstract summary: We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
- Score: 132.60976158877608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video enhancement is a challenging problem, more than that of stills, mainly
due to high computational cost, larger data volumes and the difficulty of
achieving consistency in the spatio-temporal domain. In practice, these
challenges are often coupled with the lack of example pairs, which inhibits the
application of supervised learning strategies. To address these challenges, we
propose an efficient adversarial video enhancement framework that learns
directly from unpaired video examples. In particular, our framework introduces
new recurrent cells that consist of interleaved local and global modules for
implicit integration of spatial and temporal information. The proposed design
allows our recurrent cells to efficiently propagate spatio-temporal information
across frames and reduces the need for high complexity networks. Our setting
enables learning from unpaired videos in a cyclic adversarial manner, where the
proposed recurrent units are employed in all architectures. Efficient training
is accomplished by introducing one single discriminator that learns the joint
distribution of source and target domain simultaneously. The enhancement
results demonstrate clear superiority of the proposed video enhancer over the
state-of-the-art methods, in all terms of visual quality, quantitative metrics,
and inference speed. Notably, our video enhancer is capable of enhancing over
35 frames per second of FullHD video (1080x1920).
Related papers
- SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis [52.050036778325094]
We introduce SALOVA: Segment-Augmented Video Assistant, a novel video-LLM framework designed to enhance the comprehension of lengthy video content.
We present a high-quality collection of 87.8K long videos, each densely captioned at the segment level to enable models to capture scene continuity and maintain rich context.
Our framework mitigates the limitations of current video-LMMs by allowing for precise identification and retrieval of relevant video segments in response to queries.
arXiv Detail & Related papers (2024-11-25T08:04:47Z) - Bridging the Gap: A Unified Video Comprehension Framework for Moment
Retrieval and Highlight Detection [45.82453232979516]
Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis.
Recent approaches treat MR and HD as similar video grounding problems and address them together with transformer-based architecture.
We propose a Unified Video COMprehension framework (UVCOM) to bridge the gap and jointly solve MR and HD effectively.
arXiv Detail & Related papers (2023-11-28T03:55:23Z) - CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved
Self-Supervised Video Hashing [45.216750448864275]
Learn accurate hash for video retrieval can be challenging due to high local redundancy and complex global video frames.
Our proposed Contrastive Hash-temporal Information (CHAIN) outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets.
arXiv Detail & Related papers (2023-10-29T07:36:11Z) - Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - Self-Supervised Video Representation Learning via Latent Time Navigation [12.721647696921865]
Self-supervised video representation learning aims at maximizing similarity between different temporal segments of one video.
We propose Latent Time Navigation (LTN) to capture fine-grained motions.
Our experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification.
arXiv Detail & Related papers (2023-05-10T20:06:17Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A
Faster and Better Framework [93.37833982180538]
Video deraining is an important task in computer vision as the unwanted rain hampers the visibility of videos and deteriorates the robustness of most outdoor vision systems.
We present a new end-to-end deraining framework, named Enhanced Spatio-Temporal Interaction Network (ESTINet)
ESTINet considerably boosts current state-of-the-art video deraining quality and speed.
arXiv Detail & Related papers (2021-03-23T05:19:35Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.