I2V-GAN: Unpaired Infrared-to-Visible Video Translation
- URL: http://arxiv.org/abs/2108.00913v2
- Date: Wed, 4 Aug 2021 05:24:30 GMT
- Title: I2V-GAN: Unpaired Infrared-to-Visible Video Translation
- Authors: Shuang Li, Bingfeng Han, Zhenjie Yu, Chi Harold Liu, Kai Chen, Shuigen
Wang
- Abstract summary: We propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate visible light videos by given unpaired infrared videos.
Our model capitalizes on three types of constraints: 1)adversarial constraint to generate synthetic frames that are similar to the real ones, 2)cyclic consistency with the introduced perceptual loss for effective content conversion, and 3)similarity constraints across and within domains.
Experiments validate that I2V-GAN is superior to the compared SOTA methods in the translation of I2V videos with higher fluency and finer semantic details.
- Score: 14.156053075519207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human vision is often adversely affected by complex environmental factors,
especially in night vision scenarios. Thus, infrared cameras are often
leveraged to help enhance the visual effects via detecting infrared radiation
in the surrounding environment, but the infrared videos are undesirable due to
the lack of detailed semantic information. In such a case, an effective
video-to-video translation method from the infrared domain to the visible light
counterpart is strongly needed by overcoming the intrinsic huge gap between
infrared and visible fields. To address this challenging problem, we propose an
infrared-to-visible (I2V) video translation method I2V-GAN to generate
fine-grained and spatial-temporal consistent visible light videos by given
unpaired infrared videos. Technically, our model capitalizes on three types of
constraints: 1)adversarial constraint to generate synthetic frames that are
similar to the real ones, 2)cyclic consistency with the introduced perceptual
loss for effective content conversion as well as style preservation, and
3)similarity constraints across and within domains to enhance the content and
motion consistency in both spatial and temporal spaces at a fine-grained level.
Furthermore, the current public available infrared and visible light datasets
are mainly used for object detection or tracking, and some are composed of
discontinuous images which are not suitable for video tasks. Thus, we provide a
new dataset for I2V video translation, which is named IRVI. Specifically, it
has 12 consecutive video clips of vehicle and monitoring scenes, and both
infrared and visible light videos could be apart into 24352 frames.
Comprehensive experiments validate that I2V-GAN is superior to the compared
SOTA methods in the translation of I2V videos with higher fluency and finer
semantic details. The code and IRVI dataset are available at
https://github.com/BIT-DA/I2V-GAN.
Related papers
- DiffV2IR: Visible-to-Infrared Diffusion Model via Vision-Language Understanding [43.85632218045282]
We introduce DiffV2IR, a novel framework for image translation comprising two key elements: a Progressive Learning Module (PLM) and a Vision-Language Understanding Module (VLUM)
PLM features an adaptive diffusion model architecture that leverages multi-stage knowledge learning to infrared transition from full-range to target wavelength.
VLUM incorporates unified Vision-Language Understanding. We also collected a large infrared dataset, IR-500K, which includes 500,000 infrared images compiled by various scenes and objects under various environmental conditions.
arXiv Detail & Related papers (2025-03-24T17:58:09Z) - VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation [62.64811405314847]
VidCRAFT3 is a novel framework for precise image-to-video generation.
It enables control over camera motion, object motion, and lighting direction simultaneously.
It produces high-quality video content, outperforming state-of-the-art methods in control granularity and visual coherence.
arXiv Detail & Related papers (2025-02-11T13:11:59Z) - Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.
Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain [7.007302908953179]
Infrared (IR) imaging offers advantages in several fields due to its unique ability of capturing content in extreme light conditions.
As an alternative, visible light can be used to synthesize IR images but this causes a loss of fidelity in image details and introduces inconsistencies due to lack of contextual awareness of the scene.
arXiv Detail & Related papers (2024-11-25T12:23:14Z) - ThermalNeRF: Thermal Radiance Fields [32.881758519242155]
We propose a unified framework for scene reconstruction from a set of LWIR and RGB images.
We calibrate the RGB and infrared cameras with respect to each other, as a preprocessing step.
We show that our method is capable of thermal super-resolution, as well as visually removing obstacles to reveal objects occluded in either the RGB or thermal channels.
arXiv Detail & Related papers (2024-07-22T02:51:29Z) - BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - Raformer: Redundancy-Aware Transformer for Video Wire Inpainting [77.41727407673066]
Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series.
Wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks.
We introduce a new VWI dataset with a novel mask generation strategy, namely Wire Removal Video dataset 2 (WRV2) and Pseudo Wire-Shaped (PWS) Masks.
WRV2 dataset comprises over 4,000 videos with an average length of 80 frames, designed to facilitate the development and efficacy of inpainting models.
arXiv Detail & Related papers (2024-04-24T11:02:13Z) - NiteDR: Nighttime Image De-Raining with Cross-View Sensor Cooperative Learning for Dynamic Driving Scenes [49.92839157944134]
In nighttime driving scenes, insufficient and uneven lighting shrouds the scenes in darkness, resulting degradation of image quality and visibility.
We develop an image de-raining framework tailored for rainy nighttime driving scenes.
It aims to remove rain artifacts, enrich scene representation, and restore useful information.
arXiv Detail & Related papers (2024-02-28T09:02:33Z) - You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement [50.37253008333166]
Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images.
We propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI)
It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters.
arXiv Detail & Related papers (2024-02-08T16:47:43Z) - Visibility Constrained Wide-band Illumination Spectrum Design for
Seeing-in-the-Dark [38.11468156313255]
Seeing-in-the-dark is one of the most important and challenging computer vision tasks.
In this paper, we try to robustify NIR2RGB translation by designing the optimal spectrum of auxiliary illumination in the wide-band VIS-NIR range.
arXiv Detail & Related papers (2023-03-21T07:27:37Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Robust Environment Perception for Automated Driving: A Unified Learning
Pipeline for Visual-Infrared Object Detection [2.478658210785]
We exploit both visual and thermal perception units for robust object detection purposes.
In this paper, we exploit both visual and thermal perception units for robust object detection purposes.
arXiv Detail & Related papers (2022-06-08T15:02:58Z) - ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime
Infrared to Daytime Visible Video Translation [33.96130720406588]
Unpaired nighttime infrared and daytime visible videos are huger than paired ones that captured at the same time.
We propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps.
We provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation.
arXiv Detail & Related papers (2022-04-26T15:08:15Z) - Drone-based RGB-Infrared Cross-Modality Vehicle Detection via
Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image.
We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle.
Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.