Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss
- URL: http://arxiv.org/abs/2402.08267v2
- Date: Sat, 28 Sep 2024 14:05:27 GMT
- Title: Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss
- Authors: Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira Sakamoto, Takeharu Eda,
- Abstract summary: Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision.
We propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance.
- Score: 2.9687381456164004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM; optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method. \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Related papers
- Cross-Encoder Rediscovers a Semantic Variant of BM25 [20.670511323837626]
We investigate a Cross-Encoder variant of MiniLM to determine which relevance features it computes and where they are stored.
We find that it employs a semantic variant of the traditional BM25 in an interpretable manner, featuring localized components.
arXiv Detail & Related papers (2025-02-07T04:08:57Z) - UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression [9.742764207747697]
We propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method.
In the first stage, a self-encoder learns prior from the high-quality input image.
In the second stage, the prior is generated through an LDM conditioned on the decoded image of an existing learning-based image compression algorithm.
arXiv Detail & Related papers (2024-06-06T11:13:44Z) - Bridging the gap between image coding for machines and humans [20.017766644567036]
In many use cases, such as surveillance, it is important that the visual quality is not drastically deteriorated by the compression process.
Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods.
We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM.
arXiv Detail & Related papers (2024-01-19T14:49:56Z) - Image Coding for Machines with Object Region Learning [0.0]
We propose an image compression model that learns object regions.
Our model does not require additional information as input, such as an ROI-map, and does not use task-loss.
arXiv Detail & Related papers (2023-08-27T01:54:03Z) - SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
Self-distillated masked AutoEncoder network SdAE is proposed in this paper.
With only 300 epochs pre-training, a vanilla ViT-Base model achieves an 84.1% fine-tuning accuracy on ImageNet-1k classification.
arXiv Detail & Related papers (2022-07-31T15:07:25Z) - Contrastive Masked Autoencoders are Stronger Vision Learners [114.16568579208216]
Contrastive Masked Autoencoders (CMAE) is a new self-supervised pre-training method for learning more comprehensive and capable vision representations.
CMAE achieves the state-of-the-art performance on highly competitive benchmarks of image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-07-27T14:04:22Z) - Recognition-Aware Learned Image Compression [0.5801044612920815]
We propose a recognition-aware learned compression method, which optimize a rate-distortion loss alongside a task-specific loss.
Our method achieves 26% higher recognition accuracy at equivalents compared to traditional methods such as BPG.
arXiv Detail & Related papers (2022-02-01T03:33:51Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.