Related papers: Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

URL: http://arxiv.org/abs/2106.08936v1
Date: Wed, 16 Jun 2021 16:48:01 GMT
Title: Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding
Authors: Luka Murn, Saverio Blasi, Alan F. Smeaton and Marta Mrak
Abstract summary: This paper introduces a novel explainable neural network-based inter-prediction scheme. A novel training framework enables each network branch to resemble a specific fractional shift. When implemented in the context of the Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved.
Score: 5.46121027847413
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.

Related papers

Advanced Learning-Based Inter Prediction for Future Video Coding [46.4999280984859]
The paper proposes a low complexity learning-based inter prediction (LLIP) method to replace the traditional INTERPF. LLIP enhances the filtering process by leveraging a lightweight neural network model, where parameters can be exported for efficient inference. Ultimately, we replace the traditional handcraft filtering parameters in INTERPF with the learned optimal filtering parameters.
arXiv Detail & Related papers (2024-11-24T08:47:00Z)
Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z)
Progressive Fourier Neural Representation for Sequential Video Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z)
Neural Network based Inter bi-prediction Blending [8.815673539598816]
This paper presents a learning-based method to improve bi-prediction in video coding. In this context, we introduce a simple neural network that further improves the blending operation. Tests are performed and show a BD-rate improvement of -1.4% in random access configuration for a network size of fewer than 10k parameters.
arXiv Detail & Related papers (2022-01-26T13:57:48Z)
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent. We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
End-to-end Neural Video Coding Using a Compound Spatiotemporal Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches. Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z)
A Deep-Unfolded Reference-Based RPCA Network For Video Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA) Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames. Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z)
Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding [16.381904711953947]
A novel neural network-based tool is presented which improves the complexity of reference samples needed for fractional precision compensation motion. When the approach is implemented in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for individual sequences is achieved. The complexity learned is significantly reduced compared to the application of full neural network.
arXiv Detail & Related papers (2020-06-11T13:10:20Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.