Related papers: Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework

Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework

URL: http://arxiv.org/abs/2511.08915v1
Date: Thu, 13 Nov 2025 01:17:26 GMT
Title: Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework
Authors: Zifu Zhang, Shengxi Li, Xiancheng Sun, Mai Xu, Zhengyuan Liu, Jingyuan Xia,
Abstract summary: We set out the first successful attempt by a novel collaborative compression method based on the machine-vision-oriented compression.<n>A plug-and-play variable bit-rate strategy is also developed for machine vision tasks.<n>We propose to progressively aggregate the semantics from the machine-vision compression, whilst seamlessly tailing the diffusion prior to restore high-fidelity details for human vision.
Score: 54.49297832630979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-machine collaborative compression has been receiving increasing research efforts for reducing image/video data, serving as the basis for both human perception and machine intelligence. Existing collaborative methods are dominantly built upon the de facto human-vision compression pipeline, witnessing deficiency on complexity and bit-rates when aggregating the machine-vision compression. Indeed, machine vision solely focuses on the core regions within the image/video, requiring much less information compared with the compressed information for human vision. In this paper, we thus set out the first successful attempt by a novel collaborative compression method based on the machine-vision-oriented compression, instead of human-vision pipeline. In other words, machine vision serves as the basis for human vision within collaborative compression. A plug-and-play variable bit-rate strategy is also developed for machine vision tasks. Then, we propose to progressively aggregate the semantics from the machine-vision compression, whilst seamlessly tailing the diffusion prior to restore high-fidelity details for human vision, thus named as diffusion-prior based feature compression for human and machine visions (Diff-FCHM). Experimental results verify the consistently superior performances of our Diff-FCHM, on both machine-vision and human-vision compression with remarkable margins. Our code will be released upon acceptance.

Related papers

Progressive Learned Image Compression for Machine Perception [27.208988763458958]
We propose a novel progressive learned image compression for machine perception, PICM-Net, based on trit-plane coding.<n>Our approach enables efficient and adaptive progressive transmission while maintaining high performance in the downstream classification task.
arXiv Detail & Related papers (2025-12-23T05:45:38Z)
Embodied Image Compression [105.9462341161654]
This paper introduces, for the first time, the scientific problem of Embodied Image Compression.<n>We establish a standardized benchmark, EmbodiedComp, to facilitate systematic evaluation under ultra-low conditions in a closed-loop setting.<n>We demonstrate that existing Vision-Language-Action models fail to reliably perform even simple manipulation tasks when compressed below the Embodied threshold.
arXiv Detail & Related papers (2025-12-12T14:49:34Z)
Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.<n>We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.<n>Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z)
Guided Diffusion for the Extension of Machine Vision to Human Visual Perception [0.0]
We propose a method for extending machine vision to human visual perception using guided diffusion.<n> Guided diffusion acts as a bridge between machine vision and human perception, enabling transitions between them without any additional overhead.
arXiv Detail & Related papers (2025-03-23T03:04:26Z)
Hierarchical Semantic Compression for Consistent Image Semantic Restoration [62.97519327310638]
We propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models.<n> Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision.
arXiv Detail & Related papers (2025-02-24T03:20:44Z)
Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models [1.2584276673531931]
We evaluate how different approaches to lossy compression affect machine perception tasks. It is feasible to leverage compressed perceptual compression while incurring severe lossy compression. Lossy compression for pre-training can lead to degrading machine-intuitive scenarios.
arXiv Detail & Related papers (2024-01-15T20:47:24Z)
Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z)
Cross Modal Compression: Towards Human-comprehensible Semantic Compression [73.89616626853913]
Cross modal compression is a semantic compression framework for visual data. We show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio.
arXiv Detail & Related papers (2022-09-06T15:31:11Z)
Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks. Our framework is built upon the traditional non-differential codecs. Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z)
Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression. This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z)
Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.