Prompt-ICM: A Unified Framework towards Image Coding for Machines with
Task-driven Prompts
- URL: http://arxiv.org/abs/2305.02578v1
- Date: Thu, 4 May 2023 06:21:10 GMT
- Title: Prompt-ICM: A Unified Framework towards Image Coding for Machines with
Task-driven Prompts
- Authors: Ruoyu Feng, Jinming Liu, Xin Jin, Xiaohan Pan, Heming Sun, Zhibo Chen
- Abstract summary: Image coding for machines (ICM) aims to compress images to support downstream AI analysis instead of human perception.
Inspired by recent advances in transferring large-scale pre-trained models to downstream tasks via prompting, we explore a new ICM framework, Prompt-ICM.
Our method is composed of two core designs: a) compression prompts, which are implemented as importance maps predicted by an information selector, and used to achieve different content-weighted bit allocations during compression according to different downstream tasks.
- Score: 27.119835579428816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image coding for machines (ICM) aims to compress images to support downstream
AI analysis instead of human perception. For ICM, developing a unified codec to
reduce information redundancy while empowering the compressed features to
support various vision tasks is very important, which inevitably faces two core
challenges: 1) How should the compression strategy be adjusted based on the
downstream tasks? 2) How to well adapt the compressed features to different
downstream tasks? Inspired by recent advances in transferring large-scale
pre-trained models to downstream tasks via prompting, in this work, we explore
a new ICM framework, termed Prompt-ICM. To address both challenges by carefully
learning task-driven prompts to coordinate well the compression process and
downstream analysis. Specifically, our method is composed of two core designs:
a) compression prompts, which are implemented as importance maps predicted by
an information selector, and used to achieve different content-weighted bit
allocations during compression according to different downstream tasks; b)
task-adaptive prompts, which are instantiated as a few learnable parameters
specifically for tuning compressed features for the specific intelligent task.
Extensive experiments demonstrate that with a single feature codec and a few
extra parameters, our proposed framework could efficiently support different
kinds of intelligent tasks with much higher coding efficiency.
Related papers
- RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression [68.31184784672227]
In modern applications such as autonomous driving, an overwhelming majority of videos serve as input for AI systems performing tasks.
It is therefore useful to optimize the encoder for a downstream task instead of for image quality.
Here, we address this challenge by controlling the Quantization Parameters (QPs) at the macro-block level to optimize the downstream task.
arXiv Detail & Related papers (2025-01-21T15:36:08Z) - TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning [11.167198972934736]
Large language models (LLMs) such as GPT-4 have led to a surge in the size of prompts required for optimal performance.
We propose a novel and efficient reinforcement learning (RL) based task-aware prompt compression method.
We demonstrate that our RL-guided compression method improves the task performance by 8% - 189% over state-of-the-art compression techniques.
arXiv Detail & Related papers (2024-09-19T18:11:59Z) - Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - Rate-Distortion-Cognition Controllable Versatile Neural Image Compression [47.72668401825835]
We propose a rate-distortion-cognition controllable versatile image compression method.
Our method yields satisfactory ICM performance and flexible Rate-DistortionCognition controlling.
arXiv Detail & Related papers (2024-07-16T13:17:51Z) - CMC-Bench: Towards a New Paradigm of Visual Signal Compression [85.1839779884282]
We introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression.
At ultra-lows, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal protocols.
arXiv Detail & Related papers (2024-06-13T17:41:37Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.