Supervised Image Translation from Visible to Infrared Domain for Object Detection
- URL: http://arxiv.org/abs/2408.01843v1
- Date: Sat, 3 Aug 2024 18:51:04 GMT
- Title: Supervised Image Translation from Visible to Infrared Domain for Object Detection
- Authors: Prahlad Anand, Qiranul Saadiyean, Aniruddh Sikdar, Nalini N, Suresh Sundaram,
- Abstract summary: This study aims to learn a translation from visible to infrared imagery, bridging the domain gap between the two modalities.
We adopt a two-stage training strategy with a Generative Adversarial Network and an object detection model.
Images so generated are used to train standard object detection frameworks including Yolov5, Mask and Faster RCNN.
- Score: 1.7851018240619703
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study aims to learn a translation from visible to infrared imagery, bridging the domain gap between the two modalities so as to improve accuracy on downstream tasks including object detection. Previous approaches attempt to perform bi-domain feature fusion through iterative optimization or end-to-end deep convolutional networks. However, we pose the problem as similar to that of image translation, adopting a two-stage training strategy with a Generative Adversarial Network and an object detection model. The translation model learns a conversion that preserves the structural detail of visible images while preserving the texture and other characteristics of infrared images. Images so generated are used to train standard object detection frameworks including Yolov5, Mask and Faster RCNN. We also investigate the usefulness of integrating a super-resolution step into our pipeline to further improve model accuracy, and achieve an improvement of as high as 5.3% mAP.
Related papers
- DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model.
Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z) - Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - A Dual Attentive Generative Adversarial Network for Remote Sensing Image
Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks.
The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z) - LadleNet: A Two-Stage UNet for Infrared Image to Visible Image Translation Guided by Semantic Segmentation [5.125530969984795]
We propose an improved algorithm for image translation based on U-net called LadleNet.
LadleNet+ replaces the Handle module in LadleNet with a pre-trained DeepLabv3+ network, enabling the model to have a more powerful capability in constructing semantic space.
Compared to existing methods, LadleNet and LadleNet+ achieved an average improvement of 12.4% and 15.2% in SSIM metrics, and 37.9% and 50.6% in MS-SSIM metrics, respectively.
arXiv Detail & Related papers (2023-08-12T16:14:44Z) - Remote Sensing Image Change Detection with Graph Interaction [1.8579693774597708]
We propose a bitemporal image graph Interaction network for remote sensing change detection, namely BGINet-CD.
Our model demonstrates superior performance compared to other state-of-the-art methods (SOTA) on the GZ CD dataset.
arXiv Detail & Related papers (2023-07-05T03:32:49Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Progressively Guided Alternate Refinement Network for RGB-D Salient
Object Detection [63.18846475183332]
We aim to develop an efficient and compact deep network for RGB-D salient object detection.
We propose a progressively guided alternate refinement network to refine it.
Our model outperforms existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2020-08-17T02:55:06Z) - Saliency Enhancement using Gradient Domain Edges Merging [65.90255950853674]
We develop a method to merge the edges with the saliency maps to improve the performance of the saliency.
This leads to our proposed saliency enhancement using edges (SEE) with an average improvement of at least 3.4 times higher on the DUT-OMRON dataset.
The SEE algorithm is split into 2 parts, SEE-Pre for preprocessing and SEE-Post pour postprocessing.
arXiv Detail & Related papers (2020-02-11T14:04:56Z) - Supervised and Unsupervised Learning of Parameterized Color Enhancement [112.88623543850224]
We tackle the problem of color enhancement as an image translation task using both supervised and unsupervised learning.
We achieve state-of-the-art results compared to both supervised (paired data) and unsupervised (unpaired data) image enhancement methods on the MIT-Adobe FiveK benchmark.
We show the generalization capability of our method, by applying it on photos from the early 20th century and to dark video frames.
arXiv Detail & Related papers (2019-12-30T13:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.