Related papers: ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

URL: http://arxiv.org/abs/2204.12367v1
Date: Tue, 26 Apr 2022 15:08:15 GMT
Title: ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation
Authors: Zhenjie Yu, Kai Chen, Shuang Li, Bingfeng Han, Chi Harold Liu and Shuigen Wang
Abstract summary: Unpaired nighttime infrared and daytime visible videos are huger than paired ones that captured at the same time. We propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. We provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation.
Score: 33.96130720406588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle. This makes it arduous to capture and analyze information for human beings, meanwhile hindering its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Extensive experiments and evaluations for specific applications indicate ROMA outperforms the state-of-the-art methods. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity. In particular, it consists of 9 long video clips including City, Highway and Monitor scenarios. All clips could be split into 603,142 frames in total, which are 20 times larger than the recently released daytime infrared-to-visible dataset IRVI.

Related papers

CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors [6.8163437709379835]
Adversarial patches are widely used to evaluate the robustness of object detection systems in real-world scenarios. We propose CDUPatch, a universal cross-modal patch attack against visible-infrared object detectors across scales, views, and scenarios. By learning an optimal color distribution on the adversarial patch, we can manipulate its thermal response and generate an adversarial infrared texture.
arXiv Detail & Related papers (2025-04-15T05:46:00Z)
Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains. We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset. We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z)
TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation [25.542902579879367]
This paper proposes a novel diffusion method, dubbed Temporally Consistent Patch Diffusion Models (TC-DPM) Our method faithfully preserves the semantic structure of generated visible images. Experiment shows that TC-PDM outperforms state-of-the-art methods by 35.3% in FVD for infrared-to-visible video translation and by 6.1% in AP50 for day-to-night object detection.
arXiv Detail & Related papers (2024-08-26T12:43:48Z)
ThermalNeRF: Thermal Radiance Fields [32.881758519242155]
We propose a unified framework for scene reconstruction from a set of LWIR and RGB images. We calibrate the RGB and infrared cameras with respect to each other, as a preprocessing step. We show that our method is capable of thermal super-resolution, as well as visually removing obstacles to reveal objects occluded in either the RGB or thermal channels.
arXiv Detail & Related papers (2024-07-22T02:51:29Z)
NiteDR: Nighttime Image De-Raining with Cross-View Sensor Cooperative Learning for Dynamic Driving Scenes [49.92839157944134]
In nighttime driving scenes, insufficient and uneven lighting shrouds the scenes in darkness, resulting degradation of image quality and visibility. We develop an image de-raining framework tailored for rainy nighttime driving scenes. It aims to remove rain artifacts, enrich scene representation, and restore useful information.
arXiv Detail & Related papers (2024-02-28T09:02:33Z)
Non-Contact NIR PPG Sensing through Large Sequence Signal Regression [0.0]
Non-Contact sensing is an emerging technology with applications across many industries from driver monitoring in vehicles to patient monitoring in healthcare. Current state-of-the-art focus on RGB video, but this struggles in varying/noisy light conditions and is almost completely unfeasible in the dark. Near Infra-Red (NIR) video, however, does not suffer from these constraints. This paper aims to demonstrate the effectiveness of an alternative Convolution Attention Network (CAN) architecture, to regress photoplethysmography (NIR) signal from a sequence of NIR frames.
arXiv Detail & Related papers (2023-11-20T13:34:51Z)
Boosting Night-time Scene Parsing with Learnable Frequency [53.05778451012621]
Night-Time Scene Parsing (NTSP) is essential to many vision applications, especially for autonomous driving. Most of the existing methods are proposed for day-time scene parsing. We show that our method performs favorably against the state-of-the-art methods on the NightCity, NightCity+ and BDD100K-night datasets.
arXiv Detail & Related papers (2022-08-30T13:09:59Z)
Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images. This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z)
I2V-GAN: Unpaired Infrared-to-Visible Video Translation [14.156053075519207]
We propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate visible light videos by given unpaired infrared videos. Our model capitalizes on three types of constraints: 1)adversarial constraint to generate synthetic frames that are similar to the real ones, 2)cyclic consistency with the introduced perceptual loss for effective content conversion, and 3)similarity constraints across and within domains. Experiments validate that I2V-GAN is superior to the compared SOTA methods in the translation of I2V videos with higher fluency and finer semantic details.
arXiv Detail & Related papers (2021-08-02T14:04:19Z)
An Integrated Enhancement Solution for 24-hour Colorful Imaging [51.782600936647235]
Current industry practice for 24-hour outdoor imaging is to use a silicon camera supplemented with near-infrared (NIR) illumination. This will result in color images with poor contrast at daytime and absence of chrominance at nighttime. We propose a novel and integrated enhancement solution that produces clear color images, whether at abundant sunlight daytime or extremely low-light nighttime.
arXiv Detail & Related papers (2020-05-10T05:11:34Z)
Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
Translating multispectral imagery to nighttime imagery via conditional generative adversarial networks [24.28488767429697]
This study explores the potential of conditional Generative Adversarial Networks (cGAN) in translating multispectral imagery to nighttime imagery. A popular cGAN framework, pix2pix, was adopted and modified to facilitate this translation. With the additional social media data, the generated nighttime imagery can be very similar to the ground-truth imagery.
arXiv Detail & Related papers (2019-12-28T03:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.