Related papers: New VVC profiles targeting Feature Coding for Machines

New VVC profiles targeting Feature Coding for Machines

URL: http://arxiv.org/abs/2512.08227v1
Date: Tue, 09 Dec 2025 04:13:07 GMT
Title: New VVC profiles targeting Feature Coding for Machines
Authors: Md Eimran Hossain Eimon, Ashan Perera, Juan Merlos, Velibor Adzic, Hari Kalva,
Abstract summary: Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant.<n>In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard.<n>Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest.
Score: 0.5437050212139086
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.

Related papers

TeCoNeRV: Leveraging Temporal Coherence for Compressible Neural Representations for Videos [51.99176811574457]
Implicit Neural Representations (INRs) have recently demonstrated impressive performance for video compression.<n>However, scaling to high-resolution videos while maintaining encoding efficiency remains a significant challenge.<n>We address these fundamental limitations through three key contributions.<n>We are the first hypernetwork approach to demonstrate results at 480p, 720p and 1080p on UVG, HEVC and MCL-JCV.
arXiv Detail & Related papers (2026-02-18T18:59:55Z)
Emerging Standards for Machine-to-Machine Video Coding [0.9368339942045111]
Video Coding for Machines (VCM) is designed to apply task-aware coding tools in the pixel domain.<n>Feature Coding for Machines (FCM) is designed to compress intermediate neural features.<n>FCM is capable of maintaining accuracy close to edge while significantly reducing compute.
arXiv Detail & Related papers (2025-12-11T02:27:49Z)
Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.<n>We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.<n>Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z)
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding [55.320254859515714]
Multimodal Large Language Models (MLLMs) have revolutionized video understanding, yet are still limited by context length when processing long videos.<n>We propose AdaReTaKe, a training-free method that flexibly reduces visual redundancy by allocating compression ratios among time and layers with theoretical guarantees.<n>Experiments on VideoMME, MLVU, LongVideoBench, and LVBench datasets demonstrate that AdaReTaKe outperforms existing methods by 2.3% and 2.8% for 7B and 72B models, respectively.
arXiv Detail & Related papers (2025-03-16T16:14:52Z)
Towards Practical Real-Time Neural Video Compression [60.390180067626396]
We introduce a practical real-time neural video (NVC) designed to deliver high compression ratio, low latency and broad versatility.<n>Experiments show our proposed DCVC-RT achieves an impressive average encoding/desampling speed 125.2/112.8 (frames per second) for 1080p video, while saving an average of 21% in fps compared to H.266/VTM.
arXiv Detail & Related papers (2025-02-28T06:32:23Z)
Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.<n>We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.<n>We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z)
Accelerating Learned Video Compression via Low-Resolution Representation Learning [18.399027308582596]
We introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning. Our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM.
arXiv Detail & Related papers (2024-07-23T12:02:57Z)
End-to-End Learnable Multi-Scale Feature Compression for VCM [8.037759667748768]
We propose a novel multi-scale feature compression method that enables the end-to-end optimization on the extracted features and the design of lightweight encoders. Our model outperforms previous approaches by at least 52% BD-rate reduction and has $times5$ to $times27$ times less encoding time for object detection.
arXiv Detail & Related papers (2023-06-29T04:05:13Z)
Pruned Lightweight Encoders for Computer Vision [0.0]
We show that ASTC and JPEG XS encoding configurations can be used on a near-sensor edge device to ensure low latency. We reduced the classification accuracy and segmentation mean over union (mIoU) degradation due to ASTC compression to 4.9-5.0 percentage points (pp) and 4.4-4.0 pp, respectively. In terms of encoding speed, our ASTC encoder implementation is 2.3x faster than JPEG.
arXiv Detail & Related papers (2022-11-23T17:11:48Z)
AlphaVC: High-Performance and Efficient Learned Video Compression [4.807439168741098]
We introduce conditional-I-frame as the first frame in the GoP, which stabilizes the reconstructed quality and saves the bit-rate. Second, to efficiently improve the accuracy of inter prediction without increasing the complexity of decoder, we propose a pixel-to-feature motion prediction method at encoder side. Third, we propose a probability-based entropy skipping method, which not only brings performance gain, but also greatly reduces the runtime of entropy coding.
arXiv Detail & Related papers (2022-07-29T13:52:44Z)
ELF-VC: Efficient Learned Flexible-Rate Video Coding [61.10102916737163]
We propose several novel ideas for learned video compression which allow for improved performance for the low-latency mode. We benchmark our method, which we call ELF-VC, on popular video test sets UVG and MCL-JCV. Our approach runs at least 5x faster and has fewer parameters than all ML codecs which report these figures.
arXiv Detail & Related papers (2021-04-29T17:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.