Image Coding for Machines with Object Region Learning
- URL: http://arxiv.org/abs/2308.13984v1
- Date: Sun, 27 Aug 2023 01:54:03 GMT
- Title: Image Coding for Machines with Object Region Learning
- Authors: Takahiro Shindo, Taiju Watanabe, Kein Yamada, Hiroshi Watanabe
- Abstract summary: We propose an image compression model that learns object regions.
Our model does not require additional information as input, such as an ROI-map, and does not use task-loss.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compression technology is essential for efficient image transmission and
storage. With the rapid advances in deep learning, images are beginning to be
used for image recognition as well as for human vision. For this reason,
research has been conducted on image coding for image recognition, and this
field is called Image Coding for Machines (ICM). There are two main approaches
in ICM: the ROI-based approach and the task-loss-based approach. The former
approach has the problem of requiring an ROI-map as input in addition to the
input image. The latter approach has the problems of difficulty in learning the
task-loss, and lack of robustness because the specific image recognition model
is used to compute the loss function. To solve these problems, we propose an
image compression model that learns object regions. Our model does not require
additional information as input, such as an ROI-map, and does not use
task-loss. Therefore, it is possible to compress images for various image
recognition models. In the experiments, we demonstrate the versatility of the
proposed method by using three different image recognition models and three
different datasets. In addition, we verify the effectiveness of our model by
comparing it with previous methods.
Related papers
- A Large-scale AI-generated Image Inpainting Benchmark [11.216906046169683]
We propose a methodology for creating high-quality inpainting datasets and apply it to create DiQuID.
DiQuID comprises over 95,000 inpainted images generated from 78,000 original images sourced from MS-COCO, RAISE, and OpenImages.
We provide comprehensive benchmarking results using state-of-the-art forgery detection methods, demonstrating the dataset's effectiveness in evaluating and improving detection algorithms.
arXiv Detail & Related papers (2025-02-10T15:56:28Z) - MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences.
We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images.
Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z) - Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Supervised Deep Learning for Content-Aware Image Retargeting with
Fourier Convolutions [11.031841470875571]
Image aims to alter the size of the image with attention to the contents.
Labeled datasets are unavailable for training deep learning models in the image tasks.
Regular convolutional neural networks cannot generate images of different sizes in inference time.
arXiv Detail & Related papers (2023-06-12T19:17:44Z) - 3D-Augmented Contrastive Knowledge Distillation for Image-based Object
Pose Estimation [4.415086501328683]
We deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based.
We propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model.
We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin.
arXiv Detail & Related papers (2022-06-02T16:46:18Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image.
Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.