A New Image Codec Paradigm for Human and Machine Uses
- URL: http://arxiv.org/abs/2112.10071v1
- Date: Sun, 19 Dec 2021 06:17:38 GMT
- Title: A New Image Codec Paradigm for Human and Machine Uses
- Authors: Sien Chen, Jian Jin, Lili Meng, Weisi Lin, Zhuo Chen, Tsui-Shan Chang,
Zhengguang Li, Huaxiang Zhang
- Abstract summary: A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
- Score: 53.48873918537017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the AI of Things (AIoT) development, a huge amount of visual data, e.g.,
images and videos, are produced in our daily work and life. These visual data
are not only used for human viewing or understanding but also for machine
analysis or decision-making, e.g., intelligent surveillance, automated
vehicles, and many other smart city applications. To this end, a new image
codec paradigm for both human and machine uses is proposed in this work.
Firstly, the high-level instance segmentation map and the low-level signal
features are extracted with neural networks. Then, the instance segmentation
map is further represented as a profile with the proposed 16-bit gray-scale
representation. After that, both 16-bit gray-scale profile and signal features
are encoded with a lossless codec. Meanwhile, an image predictor is designed
and trained to achieve the general-quality image reconstruction with the 16-bit
gray-scale profile and signal features. Finally, the residual map between the
original image and the predicted one is compressed with a lossy codec, used for
high-quality image reconstruction. With such designs, on the one hand, we can
achieve scalable image compression to meet the requirements of different human
consumption; on the other hand, we can directly achieve several machine vision
tasks at the decoder side with the decoded 16-bit gray-scale profile, e.g.,
object classification, detection, and segmentation. Experimental results show
that the proposed codec achieves comparable results as most learning-based
codecs and outperforms the traditional codecs (e.g., BPG and JPEG2000) in terms
of PSNR and MS-SSIM for image reconstruction. At the same time, it outperforms
the existing codecs in terms of the mAP for object detection and segmentation.
Related papers
- Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression.
It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information.
It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z) - Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks.
Our framework is built upon the traditional non-differential codecs.
Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Image coding for machines: an end-to-end learned approach [23.92748892163087]
In this paper, we propose an image for machines which is neural network (NN) based and end-to-end learned.
Our results show that our NN-based task outperforms the state-of-the-art Versa-tile Video Coding (VVC) standard on the object detection and instance segmentation tasks.
To the best of our knowledge, this is the first end-to-end learned machine-targeted image distortion.
arXiv Detail & Related papers (2021-08-23T07:54:42Z) - Image Compression with Encoder-Decoder Matched Semantic Segmentation [15.536056887418676]
layered image compression is a promising direction.
Some works transmit the semantic segment together with the compressed image data.
We propose a new layered image compression framework with encoder matched semantic segmentation (EDMS)
The proposed EDMS framework can get up to 35.31% BD-rate reduction over the HEVC-based (BPG) encoding time saving.
arXiv Detail & Related papers (2021-01-24T04:11:05Z) - How to Exploit the Transferability of Learned Image Compression to
Conventional Codecs [25.622863999901874]
We show how learned image coding can be used as a surrogate to optimize an image for encoding.
Our approach can remodel a conventional image to adjust for the MS-SSIM distortion with over 20% rate improvement without any decoding overhead.
arXiv Detail & Related papers (2020-12-03T12:34:51Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.