ReBotNet: Fast Real-time Video Enhancement
- URL: http://arxiv.org/abs/2303.13504v1
- Date: Thu, 23 Mar 2023 17:58:05 GMT
- Title: ReBotNet: Fast Real-time Video Enhancement
- Authors: Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan
Xi, Andreas Lugmayr, Vishal M. Patel, Anne Menini
- Abstract summary: Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
- Score: 59.08038313427057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most video restoration networks are slow, have high computational load, and
can't be used for real-time video enhancement. In this work, we design an
efficient and fast framework to perform real-time video enhancement for
practical use-cases like live video calls and video streams. Our proposed
method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a
dual-branch framework. The first branch learns spatio-temporal features by
tokenizing the input frames along the spatial and temporal dimensions using a
ConvNext-based encoder and processing these abstract tokens using a bottleneck
mixer. To further improve temporal consistency, the second branch employs a
mixer directly on tokens extracted from individual frames. A common decoder
then merges the features form the two branches to predict the enhanced frame.
In addition, we propose a recurrent training approach where the last frame's
prediction is leveraged to efficiently enhance the current frame while
improving temporal consistency. To evaluate our method, we curate two new
datasets that emulate real-world video call and streaming scenarios, and show
extensive results on multiple datasets where ReBotNet outperforms existing
approaches with lower computations, reduced memory requirements, and faster
inference time.
Related papers
- SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity [15.872209884833977]
We propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation.
SparseTem achieves speedup of 1.79x for EfficientDet and 4.72x for CRNN, with minimal accuracy drop and no additional memory overhead.
arXiv Detail & Related papers (2024-10-28T07:13:25Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Look More but Care Less in Video Recognition [57.96505328398205]
Action recognition methods typically sample a few frames to represent each video to avoid the enormous computation.
We propose Ample and Focal Network (AFNet), which is composed of two branches to utilize more frames but with less computation.
arXiv Detail & Related papers (2022-11-18T02:39:56Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Representation Recycling for Streaming Video Analysis [19.068248496174903]
StreamDEQ aims to infer frame-wise representations on videos with minimal per-frame computation.
We show that StreamDEQ is able to recover near-optimal representations in a few frames' time and maintain an up-to-date representation throughout the video duration.
arXiv Detail & Related papers (2022-04-28T13:35:14Z) - Adaptive Focus for Efficient Video Recognition [29.615394426035074]
We propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus)
A light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions.
During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices.
arXiv Detail & Related papers (2021-05-07T13:24:47Z) - Coarse-Fine Networks for Temporal Activity Detection in Videos [45.03545172714305]
We introduce 'Co-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.
We show that our method can outperform the state-of-the-arts for action detection in public datasets with a significantly reduced compute and memory footprint.
arXiv Detail & Related papers (2021-03-01T20:48:01Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.