Related papers: Image Coding for Machines with Object Region Learning

Image Coding for Machines with Object Region Learning

URL: http://arxiv.org/abs/2308.13984v1
Date: Sun, 27 Aug 2023 01:54:03 GMT
Title: Image Coding for Machines with Object Region Learning
Authors: Takahiro Shindo, Taiju Watanabe, Kein Yamada, Hiroshi Watanabe
Abstract summary: We propose an image compression model that learns object regions. Our model does not require additional information as input, such as an ROI-map, and does not use task-loss.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compression technology is essential for efficient image transmission and storage. With the rapid advances in deep learning, images are beginning to be used for image recognition as well as for human vision. For this reason, research has been conducted on image coding for image recognition, and this field is called Image Coding for Machines (ICM). There are two main approaches in ICM: the ROI-based approach and the task-loss-based approach. The former approach has the problem of requiring an ROI-map as input in addition to the input image. The latter approach has the problems of difficulty in learning the task-loss, and lack of robustness because the specific image recognition model is used to compute the loss function. To solve these problems, we propose an image compression model that learns object regions. Our model does not require additional information as input, such as an ROI-map, and does not use task-loss. Therefore, it is possible to compress images for various image recognition models. In the experiments, we demonstrate the versatility of the proposed method by using three different image recognition models and three different datasets. In addition, we verify the effectiveness of our model by comparing it with previous methods.

Related papers

Explicit Residual-Based Scalable Image Coding for Humans and Machines [0.0]
scalable image compression methods serve both machine and human vision.<n>In this paper, we enhance the coding efficiency and interpretability of ICMH framework by integrating an explicit residual compression mechanism.<n>We propose two complementary methods: Feature Residual-based Residual-based Coding (FR-ICMH) and Pixel Residual-based Residual-based Coding (PR-ICMH)
arXiv Detail & Related papers (2025-06-24T04:01:53Z)
A Large-scale AI-generated Image Inpainting Benchmark [11.216906046169683]
We propose a methodology for creating high-quality inpainting datasets and apply it to create DiQuID. DiQuID comprises over 95,000 inpainted images generated from 78,000 original images sourced from MS-COCO, RAISE, and OpenImages. We provide comprehensive benchmarking results using state-of-the-art forgery detection methods, demonstrating the dataset's effectiveness in evaluating and improving detection algorithms.
arXiv Detail & Related papers (2025-02-10T15:56:28Z)
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences. We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images. Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z)
Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP) Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid. PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss [2.9687381456164004]
Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. We propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance.
arXiv Detail & Related papers (2024-02-13T07:45:25Z)
Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training. This paper approaches the generated image detection problem from a new perspective: Start from real images. By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Supervised Deep Learning for Content-Aware Image Retargeting with Fourier Convolutions [11.031841470875571]
Image aims to alter the size of the image with attention to the contents. Labeled datasets are unavailable for training deep learning models in the image tasks. Regular convolutional neural networks cannot generate images of different sizes in inference time.
arXiv Detail & Related papers (2023-06-12T19:17:44Z)
3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation [4.415086501328683]
We deal with the problem in a reasonable new setting, namely 3D shape is exploited in the training process, and the testing is still purely image-based. We propose a novel contrastive knowledge distillation framework that effectively transfers 3D-augmented image representation from a multi-modal model to an image-based model. We experimentally report state-of-the-art results compared with existing category-agnostic image-based methods by a large margin.
arXiv Detail & Related papers (2022-06-02T16:46:18Z)
AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures. Our experiments demonstrate that the method is able to represent the image in low dimensional space. Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z)
Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image. Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z)
Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution. We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.