New VVC profiles targeting Feature Coding for Machines
- URL: http://arxiv.org/abs/2512.08227v1
- Date: Tue, 09 Dec 2025 04:13:07 GMT
- Title: New VVC profiles targeting Feature Coding for Machines
- Authors: Md Eimran Hossain Eimon, Ashan Perera, Juan Merlos, Velibor Adzic, Hari Kalva,
- Abstract summary: Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant.<n>In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard.<n>Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest.
- Score: 0.5437050212139086
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.
Related papers
- TeCoNeRV: Leveraging Temporal Coherence for Compressible Neural Representations for Videos [51.99176811574457]
Implicit Neural Representations (INRs) have recently demonstrated impressive performance for video compression.<n>However, scaling to high-resolution videos while maintaining encoding efficiency remains a significant challenge.<n>We address these fundamental limitations through three key contributions.<n>We are the first hypernetwork approach to demonstrate results at 480p, 720p and 1080p on UVG, HEVC and MCL-JCV.
arXiv Detail & Related papers (2026-02-18T18:59:55Z) - Emerging Standards for Machine-to-Machine Video Coding [0.9368339942045111]
Video Coding for Machines (VCM) is designed to apply task-aware coding tools in the pixel domain.<n>Feature Coding for Machines (FCM) is designed to compress intermediate neural features.<n>FCM is capable of maintaining accuracy close to edge while significantly reducing compute.
arXiv Detail & Related papers (2025-12-11T02:27:49Z) - Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.<n>We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.<n>Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z) - AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding [55.320254859515714]
Multimodal Large Language Models (MLLMs) have revolutionized video understanding, yet are still limited by context length when processing long videos.<n>We propose AdaReTaKe, a training-free method that flexibly reduces visual redundancy by allocating compression ratios among time and layers with theoretical guarantees.<n>Experiments on VideoMME, MLVU, LongVideoBench, and LVBench datasets demonstrate that AdaReTaKe outperforms existing methods by 2.3% and 2.8% for 7B and 72B models, respectively.
arXiv Detail & Related papers (2025-03-16T16:14:52Z) - Towards Practical Real-Time Neural Video Compression [60.390180067626396]
We introduce a practical real-time neural video (NVC) designed to deliver high compression ratio, low latency and broad versatility.<n>Experiments show our proposed DCVC-RT achieves an impressive average encoding/desampling speed 125.2/112.8 (frames per second) for 1080p video, while saving an average of 21% in fps compared to H.266/VTM.
arXiv Detail & Related papers (2025-02-28T06:32:23Z) - Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.<n>We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.<n>We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z) - Accelerating Learned Video Compression via Low-Resolution Representation Learning [18.399027308582596]
We introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning.
Our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM.
arXiv Detail & Related papers (2024-07-23T12:02:57Z) - End-to-End Learnable Multi-Scale Feature Compression for VCM [8.037759667748768]
We propose a novel multi-scale feature compression method that enables the end-to-end optimization on the extracted features and the design of lightweight encoders.
Our model outperforms previous approaches by at least 52% BD-rate reduction and has $times5$ to $times27$ times less encoding time for object detection.
arXiv Detail & Related papers (2023-06-29T04:05:13Z) - Pruned Lightweight Encoders for Computer Vision [0.0]
We show that ASTC and JPEG XS encoding configurations can be used on a near-sensor edge device to ensure low latency.
We reduced the classification accuracy and segmentation mean over union (mIoU) degradation due to ASTC compression to 4.9-5.0 percentage points (pp) and 4.4-4.0 pp, respectively.
In terms of encoding speed, our ASTC encoder implementation is 2.3x faster than JPEG.
arXiv Detail & Related papers (2022-11-23T17:11:48Z) - AlphaVC: High-Performance and Efficient Learned Video Compression [4.807439168741098]
We introduce conditional-I-frame as the first frame in the GoP, which stabilizes the reconstructed quality and saves the bit-rate.
Second, to efficiently improve the accuracy of inter prediction without increasing the complexity of decoder, we propose a pixel-to-feature motion prediction method at encoder side.
Third, we propose a probability-based entropy skipping method, which not only brings performance gain, but also greatly reduces the runtime of entropy coding.
arXiv Detail & Related papers (2022-07-29T13:52:44Z) - ELF-VC: Efficient Learned Flexible-Rate Video Coding [61.10102916737163]
We propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode.
We benchmark our method, which we call ELF-VC, on popular video test sets UVG and MCL-JCV.
Our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures.
arXiv Detail & Related papers (2021-04-29T17:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.