BLINC: Lightweight Bimodal Learning for Low-Complexity VVC Intra Coding
- URL: http://arxiv.org/abs/2201.07823v1
- Date: Wed, 19 Jan 2022 19:12:41 GMT
- Title: BLINC: Lightweight Bimodal Learning for Low-Complexity VVC Intra Coding
- Authors: Farhad Pakdaman, Mohammad Ali Adelimanesh, Mahmoud Reza Hashemi
- Abstract summary: Versatile Video Coding (VVC) achieves almost twice coding efficiency compared to its predecessor, the High Efficiency Video Coding (HEVC)
This paper proposes a novel machine learning approach that jointly and separately employs two modalities of features, to simplify the intra coding decision.
- Score: 5.629161809575015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The latest video coding standard, Versatile Video Coding (VVC), achieves
almost twice coding efficiency compared to its predecessor, the High Efficiency
Video Coding (HEVC). However, achieving this efficiency (for intra coding)
requires 31x computational complexity compared to HEVC, making it challenging
for low power and real-time applications. This paper, proposes a novel machine
learning approach that jointly and separately employs two modalities of
features, to simplify the intra coding decision. First a set of features are
extracted that use the existing DCT core of VVC, to assess the texture
characteristics, and forms the first modality of data. This produces high
quality features with almost no overhead. The distribution of intra modes at
the neighboring blocks is also used to form the second modality of data, which
provides statistical information about the frame. Second, a two-step feature
reduction method is designed that reduces the size of feature set, such that a
lightweight model with a limited number of parameters can be used to learn the
intra mode decision task. Third, three separate training strategies are
proposed (1) an offline training strategy using the first (single) modality of
data, (2) an online training strategy that uses the second (single) modality,
and (3) a mixed online-offline strategy that uses bimodal learning. Finally, a
low-complexity encoding algorithms is proposed based on the proposed learning
strategies. Extensive experimental results show that the proposed methods can
reduce up to 24% of encoding time, with a negligible loss of coding efficiency.
Moreover, it is demonstrated how a bimodal learning strategy can boost the
performance of learning. Lastly, the proposed method has a very low
computational overhead (0.2%), and uses existing components of a VVC encoder,
which makes it much more practical compared to competing solutions.
Related papers
- A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Deep Learning-Based Intra Mode Derivation for Versatile Video Coding [65.96100964146062]
An intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD)
The architecture of DLIMD is developed to adapt to different quantization parameter settings and variable coding blocks including non-square ones.
The proposed method can achieve 2.28%, 1.74%, and 2.18% bit rate reduction on average for Y, U, and V components on the platform of Versatile Video Coding (VVC) test model.
arXiv Detail & Related papers (2022-04-08T13:23:59Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [49.27021844132522]
State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN)
We propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution.
Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost.
arXiv Detail & Related papers (2020-08-24T14:48:23Z) - Large-scale Transfer Learning for Low-resource Spoken Language
Understanding [31.013231069185387]
We propose an attention-based Spoken Language Understanding model together with three encoder enhancement strategies to overcome data sparsity challenge.
Cross-language transfer learning and multi-task strategies have been improved by up to 4:52% and 3:89% respectively, compared to the baseline.
arXiv Detail & Related papers (2020-08-13T03:43:05Z) - Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning.
We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations.
We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.