Fast-OMRA: Fast Online Motion Resolution Adaptation for Neural B-Frame Coding
- URL: http://arxiv.org/abs/2410.21763v1
- Date: Tue, 29 Oct 2024 05:57:32 GMT
- Title: Fast-OMRA: Fast Online Motion Resolution Adaptation for Neural B-Frame Coding
- Authors: Sang NguyenQuang, Zong-Lin Gao, Kuan-Wei Ho, Xiem HoangVan, Wen-Hsiao Peng,
- Abstract summary: Most learned B-frame codecs with hierarchical temporal prediction suffer from the domain shift issue caused by the discrepancy in the Group-of-Pictures (GOP) size used for training and test.
One effective strategy to mitigate this domain shift issue is to downsample video frames for motion estimation.
This work introduces lightweight classifiers to determine the downsampling factor.
- Score: 5.815424522820603
- License:
- Abstract: Most learned B-frame codecs with hierarchical temporal prediction suffer from the domain shift issue caused by the discrepancy in the Group-of-Pictures (GOP) size used for training and test. As such, the motion estimation network may fail to predict large motion properly. One effective strategy to mitigate this domain shift issue is to downsample video frames for motion estimation. However, finding the optimal downsampling factor involves a time-consuming rate-distortion optimization process. This work introduces lightweight classifiers to determine the downsampling factor. To strike a good rate-distortion-complexity trade-off, our classifiers observe simple state signals, including only the coding and reference frames, to predict the best downsampling factor. We present two variants that adopt binary and multi-class classifiers, respectively. The binary classifier adopts the Focal Loss for training, classifying between motion estimation at high and low resolutions. Our multi-class classifier is trained with novel soft labels incorporating the knowledge of the rate-distortion costs of different downsampling factors. Both variants operate as add-on modules without the need to re-train the B-frame codec. Experimental results confirm that they achieve comparable coding performance to the brute-force search methods while greatly reducing computational complexity.
Related papers
- SignSGD with Federated Voting [69.06621279967865]
SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization.
We propose a novel signSGD with textitfederated voting (signSGD-FV)
The idea of federated voting is to exploit learnable weights to perform weighted majority voting.
We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes.
arXiv Detail & Related papers (2024-03-25T02:32:43Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - Learning Quantization in LDPC Decoders [14.37550972719183]
We propose a floating-point surrogate model that imitates quantization effects as additions of uniform noise.
A deep learning-based method is then applied to optimize the message bitwidths.
We report an error-rate performance within 0.2 dB of floating-point decoding at an average message quantization bitwidth of 3.1 bits.
arXiv Detail & Related papers (2022-08-10T07:07:54Z) - Learning strides in convolutional neural networks [34.20666933112202]
This work introduces DiffStride, the first downsampling layer with learnable strides.
Experiments on audio and image classification show the generality and effectiveness of our solution.
arXiv Detail & Related papers (2022-02-03T16:03:36Z) - Self-Supervised Learning of Perceptually Optimized Block Motion
Estimates for Video Compression [50.48504867843605]
We propose a search-free block motion estimation framework using a multi-stage convolutional neural network.
We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames.
arXiv Detail & Related papers (2021-10-05T03:38:43Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.