Related papers: Neural B-Frame Coding: Tackling Domain Shift Issues with Lightweight Online Motion Resolution Adaptation

Neural B-Frame Coding: Tackling Domain Shift Issues with Lightweight Online Motion Resolution Adaptation

URL: http://arxiv.org/abs/2511.18724v1
Date: Mon, 24 Nov 2025 03:29:58 GMT
Title: Neural B-Frame Coding: Tackling Domain Shift Issues with Lightweight Online Motion Resolution Adaptation
Authors: Sang NguyenQuang, Xiem HoangVan, Wen-Hsiao Peng,
Abstract summary: A common solution is to turn large motion into small motion by downsampling video frames during motion estimation.<n>This work introduces lightweight classifiers to predict downsampling factors.<n>They leverage simple state signals from current and reference frames to balance rate-distortion performance with computational cost.
Score: 8.348269612691707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learned B-frame codecs with hierarchical temporal prediction often encounter the domain-shift issue due to mismatches between the Group-of-Pictures (GOP) sizes for training and testing, leading to inaccurate motion estimates, particularly for large motion. A common solution is to turn large motion into small motion by downsampling video frames during motion estimation. However, determining the optimal downsampling factor typically requires costly rate-distortion optimization. This work introduces lightweight classifiers to predict downsampling factors. These classifiers leverage simple state signals from current and reference frames to balance rate-distortion performance with computational cost. Three variants are proposed: (1) a binary classifier (Bi-Class) trained with Focal Loss to choose between high and low resolutions, (2) a multi-class classifier (Mu-Class) trained with novel soft labels based on rate-distortion costs, and (3) a co-class approach (Co-Class) that combines the predictive capability of the multi-class classifier with the selective search of the binary classifier. All classifier methods can work seamlessly with existing B-frame codecs without requiring codec retraining. Experimental results show that they achieve coding performance comparable to exhaustive search methods while significantly reducing computational complexity. The code is available at: https://github.com/NYCU-MAPL/Fast-OMRA.git.

Related papers

GLiClass: Generalist Lightweight Model for Sequence Classification Tasks [49.2639069781367]
We propose GLiClass, a novel method that adapts the GLiNER architecture for sequence classification tasks.<n>Our approach achieves strong accuracy and efficiency comparable to embedding-based methods, while maintaining the flexibility needed for zero-shot and few-shot learning scenarios.
arXiv Detail & Related papers (2025-08-11T06:22:25Z)
Fast-OMRA: Fast Online Motion Resolution Adaptation for Neural B-Frame Coding [5.815424522820603]
Most learned B-frame codecs with hierarchical temporal prediction suffer from the domain shift issue caused by the discrepancy in the Group-of-Pictures (GOP) size used for training and test. One effective strategy to mitigate this domain shift issue is to downsample video frames for motion estimation. This work introduces lightweight classifiers to determine the downsampling factor.
arXiv Detail & Related papers (2024-10-29T05:57:32Z)
IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation. We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Prediction Calibration for Generalized Few-shot Semantic Segmentation [101.69940565204816]
Generalized Few-shot Semantic (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class. We build a cross-attention module that guides the classifier's final prediction using the fused multi-level features. Our PCN outperforms the state-the-art alternatives by large margins.
arXiv Detail & Related papers (2022-10-15T13:30:12Z)
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer [112.95747173442754]
A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier. Most existing methods meta-learn all three model components for fast adaptation to a new class. In this work we propose to simplify the meta-learning task by focusing solely on the simplest component, the classifier.
arXiv Detail & Related papers (2021-08-06T10:20:08Z)
End-to-end Neural Video Coding Using a Compound Spatiotemporal Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches. Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z)
Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding [5.46121027847413]
This paper introduces a novel explainable neural network-based inter-prediction scheme. A novel training framework enables each network branch to resemble a specific fractional shift. When implemented in the context of the Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved.
arXiv Detail & Related papers (2021-06-16T16:48:01Z)
Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios. For dataset bias due to different samplers, we propose shifted batch normalization. Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z)
Temporal Bilinear Encoding Network of Audio-Visual Features at Low Sampling Rates [7.1273332508471725]
This paper aims to exploit audio-visual information in video classification with a 1 frame per second sampling rate. We propose Temporal Bilinear Networks (TBEN) for encoding both audio and visual long range temporal information.
arXiv Detail & Related papers (2020-12-18T14:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.