Image Coding for Machines with Object Region Learning
- URL: http://arxiv.org/abs/2308.13984v1
- Date: Sun, 27 Aug 2023 01:54:03 GMT
- Title: Image Coding for Machines with Object Region Learning
- Authors: Takahiro Shindo, Taiju Watanabe, Kein Yamada, Hiroshi Watanabe
- Abstract summary: We propose an image compression model that learns object regions.
Our model does not require additional information as input, such as an ROI-map, and does not use task-loss.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compression technology is essential for efficient image transmission and
storage. With the rapid advances in deep learning, images are beginning to be
used for image recognition as well as for human vision. For this reason,
research has been conducted on image coding for image recognition, and this
field is called Image Coding for Machines (ICM). There are two main approaches
in ICM: the ROI-based approach and the task-loss-based approach. The former
approach has the problem of requiring an ROI-map as input in addition to the
input image. The latter approach has the problems of difficulty in learning the
task-loss, and lack of robustness because the specific image recognition model
is used to compute the loss function. To solve these problems, we propose an
image compression model that learns object regions. Our model does not require
additional information as input, such as an ROI-map, and does not use
task-loss. Therefore, it is possible to compress images for various image
recognition models. In the experiments, we demonstrate the versatility of the
proposed method by using three different image recognition models and three
different datasets. In addition, we verify the effectiveness of our model by
comparing it with previous methods.
Related papers
- Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Improving Image Coding for Machines through Optimizing Encoder via
Auxiliary Loss [3.1457219084519004]
We propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance.
Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
arXiv Detail & Related papers (2024-02-13T07:45:25Z) - Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training.
This paper approaches the generated image detection problem from a new perspective: Start from real images.
By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Supervised Deep Learning for Content-Aware Image Retargeting with
Fourier Convolutions [11.031841470875571]
Image aims to alter the size of the image with attention to the contents.
Labeled datasets are unavailable for training deep learning models in the image tasks.
Regular convolutional neural networks cannot generate images of different sizes in inference time.
arXiv Detail & Related papers (2023-06-12T19:17:44Z) - 3D-Augmented Contrastive Knowledge Distillation for Image-based Object
Pose Estimation [4.415086501328683]
We deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based.
We propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model.
We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin.
arXiv Detail & Related papers (2022-06-02T16:46:18Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image.
Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.