Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval
- URL: http://arxiv.org/abs/2309.08167v1
- Date: Fri, 15 Sep 2023 05:31:53 GMT
- Title: Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval
- Authors: Rui Deng, Qian Wu, Yuke Li, Haoran Fu
- Abstract summary: We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
- Score: 16.497758750494537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimizing video inference efficiency has become increasingly important with
the growing demand for video analysis in various fields. Some existing methods
achieve high efficiency by explicit discard of spatial or temporal information,
which poses challenges in fast-changing and fine-grained scenarios. To address
these issues, we propose an efficient video representation network with
Differentiable Resolution Compression and Alignment mechanism, which compresses
non-essential information in the early stage of the network to reduce
computational costs while maintaining consistent temporal correlations.
Specifically, we leverage a Differentiable Context-aware Compression Module to
encode the saliency and non-saliency frame features, refining and updating the
features into a high-low resolution video sequence. To process the new
sequence, we introduce a new Resolution-Align Transformer Layer to capture
global temporal correlations among frame features with different resolutions,
while reducing spatial computation costs quadratically by utilizing fewer
spatial tokens in low-resolution non-saliency frames. The entire network can be
end-to-end optimized via the integration of the differentiable compression
module. Experimental results show that our method achieves the best trade-off
between efficiency and performance on near-duplicate video retrieval and
competitive results on dynamic video classification compared to
state-of-the-art methods. Code:https://github.com/dun-research/DRCA
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Self-Conditioned Probabilistic Learning of Video Rescaling [70.10092286301997]
We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
arXiv Detail & Related papers (2021-07-24T15:57:15Z) - Multi-Density Attention Network for Loop Filtering in Video Compression [9.322800480045336]
We propose a on-line scaling based multi-density attention network for loop filtering in video compression.
Experimental results show that 10.18% bit-rate reduction at the same video quality can be achieved over the latest Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-04-08T05:46:38Z) - Video Rescaling Networks with Joint Optimization Strategies for
Downscaling and Upscaling [15.630742638440998]
We present two joint optimization approaches based on invertible neural networks with coupling layers.
Our Long Short-Term Memory Video Rescaling Network (LSTM-VRN) leverages temporal information in the low-resolution video to form an explicit prediction of the missing high-frequency information for upscaling.
Our Multi-input Multi-output Video Rescaling Network (MIMO-VRN) proposes a new strategy for downscaling and upscaling a group of video frames simultaneously.
arXiv Detail & Related papers (2021-03-27T09:35:38Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z) - Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A
Neural Exploration via Resolution-Adaptive Learning [30.54722074562783]
We decompose the input video into respective spatial texture frames (STF) at its native spatial resolution.
Then, we compress them together using any popular video coder.
Finally, we synthesize decoded STFs and TMFs for high-quality video reconstruction at the same resolution as its native input.
arXiv Detail & Related papers (2020-12-01T17:23:53Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.