Related papers: Efficient Feature Compression for Machines with Global Statistics Preservation

Efficient Feature Compression for Machines with Global Statistics Preservation

URL: http://arxiv.org/abs/2512.09235v1
Date: Wed, 10 Dec 2025 01:51:34 GMT
Title: Efficient Feature Compression for Machines with Global Statistics Preservation
Authors: Md Eimran Hossain Eimon, Hyomin Choi, Fabien Racapé, Mateen Ulhaq, Velibor Adzic, Hari Kalva, Borko Furht,
Abstract summary: In this paper, we employ Z-score normalization to efficiently recover the compressed feature data at the decoder side.<n>Our method supersedes the existing scaling method used by the current standard under development.<n>Experiments show that using our proposed method shows 17.09% reduction in on average across different tasks and up to 65.69% for object tracking.
Score: 5.113857098394778
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The split-inference paradigm divides an artificial intelligence (AI) model into two parts. This necessitates the transfer of intermediate feature data between the two halves. Here, effective compression of the feature data becomes vital. In this paper, we employ Z-score normalization to efficiently recover the compressed feature data at the decoder side. To examine the efficacy of our method, the proposed method is integrated into the latest Feature Coding for Machines (FCM) codec standard under development by the Moving Picture Experts Group (MPEG). Our method supersedes the existing scaling method used by the current standard under development. It both reduces the overhead bits and improves the end-task accuracy. To further reduce the overhead in certain circumstances, we also propose a simplified method. Experiments show that using our proposed method shows 17.09% reduction in bitrate on average across different tasks and up to 65.69% for object tracking without sacrificing the task accuracy.

Related papers

Float8@2bits: Entropy Coding Enables Data-Free Model Compression [4.775539058503235]
We introduce EntQuant, the first framework to unite the advantages of different post-training compression regimes.<n>Our method decouples numerical precision from storage cost via entropy coding, compressing a 70B parameter model in less than 30 minutes.<n>We demonstrate that EntQuant does not only achieve state-of-the-art results on standard evaluation sets and models, but also retains functional performance on more complex benchmarks.
arXiv Detail & Related papers (2026-01-30T10:08:15Z)
SkipCat: Rank-Maximized Low-Rank Compression of Large Language Models via Shared Projection and Block Skipping [6.789200833454491]
Large language models (LLM) have achieved remarkable performance across a wide range of tasks.<n>Low-rank compression is a promising approach to address this issue, as it reduces both computational and memory costs.<n>We propose SkipCat, a novel low-rank compression framework that enables the use of higher ranks while achieving the same compression rates.
arXiv Detail & Related papers (2025-12-15T16:25:55Z)
Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z)
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z)
Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [31.210700220124192]
We show that an approach to reduce the effect of compression for a given task loss is to perform rate-distortion (RDO) using the distance between features.<n>We simplify the RDO formulation to make the distortion term computable using block-based encoders.<n>We show that simulations with transformed HEVC across multiple feature extractors and downstream networks show up to 17 % bit-rate savings for the same task accuracy.
arXiv Detail & Related papers (2025-04-03T02:11:26Z)
Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models.<n>We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism [1.7170348600689374]
We propose a novel compression methodology that dynamically determines the rank of each layer using a soft thresholding mechanism. We have successfully applied the proposed technique to attention-based architectures, including BERT for discriminative tasks and GPT2 and TinyLlama for generative tasks. Our experiments demonstrate that the proposed technique achieves a speed-up of 1.33X to 1.72X in the encoder/ decoder with a 50% reduction in total parameters.
arXiv Detail & Related papers (2024-11-15T19:29:51Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z)
End-to-end Learning of Compressible Features [35.40108701875527]
Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators. CNNs are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store. We propose a learned method that jointly optimize for compressibility along with the task objective.
arXiv Detail & Related papers (2020-07-23T05:17:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.