Edge-guided Multi-domain RGB-to-TIR image Translation for Training
Vision Tasks with Challenging Labels
- URL: http://arxiv.org/abs/2301.12689v1
- Date: Mon, 30 Jan 2023 06:44:38 GMT
- Title: Edge-guided Multi-domain RGB-to-TIR image Translation for Training
Vision Tasks with Challenging Labels
- Authors: Dong-Guw Lee, Myung-Hwan Jeon, Younggun Cho and Ayoung Kim
- Abstract summary: The insufficient number of annotated thermal infrared (TIR) image datasets hinders TIR image-based deep learning networks to have comparable performances to that of RGB.
We propose a modified multidomain RGB to TIR image translation model focused on edge preservation to employ annotated RGB images with challenging labels.
We have enabled the supervised learning of deep TIR image-based optical flow estimation and object detection that ameliorated in end point error by 56.5% on average and the best object detection mAP of 23.9% respectively.
- Score: 12.701191873813583
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The insufficient number of annotated thermal infrared (TIR) image datasets
not only hinders TIR image-based deep learning networks to have comparable
performances to that of RGB but it also limits the supervised learning of TIR
image-based tasks with challenging labels. As a remedy, we propose a modified
multidomain RGB to TIR image translation model focused on edge preservation to
employ annotated RGB images with challenging labels. Our proposed method not
only preserves key details in the original image but also leverages the optimal
TIR style code to portray accurate TIR characteristics in the translated image,
when applied on both synthetic and real world RGB images. Using our translation
model, we have enabled the supervised learning of deep TIR image-based optical
flow estimation and object detection that ameliorated in deep TIR optical flow
estimation by reduction in end point error by 56.5\% on average and the best
object detection mAP of 23.9\% respectively. Our code and supplementary
materials are available at https://github.com/rpmsnu/sRGB-TIR.
Related papers
- Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation [0.536022165180739]
We propose a novel image-to-image translation framework, Pix2Next, to generate high-quality Near-Infrared (NIR) images from RGB inputs.
A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation.
The proposed approach enables the scaling up of NIR datasets without additional data acquisition or annotation efforts, potentially accelerating advancements in NIR-based computer vision applications.
arXiv Detail & Related papers (2024-09-25T07:51:47Z) - HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information [12.376615603048279]
HalluciDet is an IR-RGB image translation model for object detection.
We empirically compare our approach against state-of-the-art methods for image translation and for fine-tuning on IR.
arXiv Detail & Related papers (2023-10-07T03:00:33Z) - Enhancing Low-Light Images Using Infrared-Encoded Images [81.8710581927427]
Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss.
We propose a novel approach to increase the visibility of images captured under low-light environments by removing the in-camera infrared (IR) cut-off filter.
arXiv Detail & Related papers (2023-07-09T08:29:19Z) - Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image
Retrieval [84.11127588805138]
Composed Image Retrieval (CIR) combines a query image with text to describe their intended target.
Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.
We propose Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training.
arXiv Detail & Related papers (2023-02-06T19:40:04Z) - Thermal Infrared Image Inpainting via Edge-Aware Guidance [8.630992878659084]
In this paper, we propose a novel task -- Thermal Infrared Image Inpainting.
We adopt the edge generator to complete the canny edges of broken TIR images.
The completed edges are projected to the normalization weights and biases to enhance edge awareness of the model.
Experiments demonstrate that our method outperforms state-of-the-art image inpainting approaches on FLIR thermal dataset.
arXiv Detail & Related papers (2022-10-28T09:06:54Z) - Translation, Scale and Rotation: Cross-Modal Alignment Meets
RGB-Infrared Vehicle Detection [10.460296317901662]
We find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems.
We propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem.
A two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images.
arXiv Detail & Related papers (2022-09-28T03:06:18Z) - RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [49.28588927121722]
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
arXiv Detail & Related papers (2022-06-14T17:59:59Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision [76.41657124981549]
This paper presents a joint learning model for image alignment and RAW-to-sRGB mapping.
Experiments show that our method performs favorably against state-of-the-arts on ZRR and SR-RAW datasets.
arXiv Detail & Related papers (2021-08-18T12:41:36Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Thermal Infrared Image Colorization for Nighttime Driving Scenes with
Top-Down Guided Attention [14.527765677864913]
We propose a toP-down attEntion And gRadient aLignment based GAN, referred to as PearlGAN.
A top-down guided attention module and an elaborate attentional loss are first designed to reduce the semantic encoding ambiguity during translation.
In addition, pixel-level annotation is carried out on a subset of FLIR and KAIST datasets to evaluate the semantic preservation performance of multiple translation methods.
arXiv Detail & Related papers (2021-04-29T14:35:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.