Improving Image Coding for Machines through Optimizing Encoder via
Auxiliary Loss
- URL: http://arxiv.org/abs/2402.08267v1
- Date: Tue, 13 Feb 2024 07:45:25 GMT
- Title: Improving Image Coding for Machines through Optimizing Encoder via
Auxiliary Loss
- Authors: Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira
Sakamoto, Takeharu Eda
- Abstract summary: We propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance.
Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
- Score: 3.1457219084519004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image coding for machines (ICM) aims to compress images for machine analysis
using recognition models rather than human vision. Hence, in ICM, it is
important for the encoder to recognize and compress the information necessary
for the machine recognition task. There are two main approaches in learned ICM;
optimization of the compression model based on task loss, and Region of
Interest (ROI) based bit allocation. These approaches provide the encoder with
the recognition capability. However, optimization with task loss becomes
difficult when the recognition model is deep, and ROI-based methods often
involve extra overhead during evaluation. In this study, we propose a novel
training method for learned ICM models that applies auxiliary loss to the
encoder to improve its recognition capability and rate-distortion performance.
Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in
object detection and semantic segmentation tasks, compared to the conventional
training method.
Related papers
- Rate-Distortion-Cognition Controllable Versatile Neural Image Compression [47.72668401825835]
We propose a rate-distortion-cognition controllable versatile image compression method.
Our method yields satisfactory ICM performance and flexible Rate-DistortionCognition controlling.
arXiv Detail & Related papers (2024-07-16T13:17:51Z) - LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression [8.80655789773014]
We propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method.
In the first stage, a self-encoder learns prior from the high-quality input image.
In the second stage, the prior is generated through an LDM conditioned on the decoded image of an existing learning-based image compression algorithm.
arXiv Detail & Related papers (2024-06-06T11:13:44Z) - Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural
Image Compression [62.888755394395716]
This study presents an enhanced neural compression method designed for optimal visual fidelity.
We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss.
Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z) - Bridging the gap between image coding for machines and humans [20.017766644567036]
In many use cases, such as surveillance, it is important that the visual quality is not drastically deteriorated by the compression process.
Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods.
We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM.
arXiv Detail & Related papers (2024-01-19T14:49:56Z) - Image Coding for Machines with Object Region Learning [0.0]
We propose an image compression model that learns object regions.
Our model does not require additional information as input, such as an ROI-map, and does not use task-loss.
arXiv Detail & Related papers (2023-08-27T01:54:03Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Asymmetric Learned Image Compression with Multi-Scale Residual Block,
Importance Map, and Post-Quantization Filtering [15.056672221375104]
Deep learning-based image compression has achieved better ratedistortion (R-D) performance than the latest traditional method, H.266/VVC.
Many leading learned schemes cannot maintain a good trade-off between performance and complexity.
We propose an effcient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art.
arXiv Detail & Related papers (2022-06-21T09:34:29Z) - Recognition-Aware Learned Image Compression [0.5801044612920815]
We propose a recognition-aware learned compression method, which optimize a rate-distortion loss alongside a task-specific loss.
Our method achieves 26% higher recognition accuracy at equivalents compared to traditional methods such as BPG.
arXiv Detail & Related papers (2022-02-01T03:33:51Z) - High Quality Segmentation for Ultra High-resolution Images [72.97958314291648]
We propose the Continuous Refinement Model for the ultra high-resolution segmentation refinement task.
Our proposed method is fast and effective on image segmentation refinement.
arXiv Detail & Related papers (2021-11-29T11:53:06Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.