Related papers: Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation

Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation

URL: http://arxiv.org/abs/2307.04470v2
Date: Thu, 30 Nov 2023 03:04:02 GMT
Title: Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation
Authors: Yexin Liu, Weiming Zhang, Guoyang Zhao, Jinjing Zhu, Athanasios Vasilakos, and Lin Wang
Abstract summary: We propose the first test-time adaptation framework, dubbed Night-TTA, to address the problems for nighttime RGBT semantic segmentation. Our method achieves state-of-the-art (SoTA) performance with a 13.07% boost in mIoU.
Score: 17.546960391700985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to scene understanding in adverse visual conditions, e.g., nighttime, has sparked active research for RGB-Thermal (RGB-T) semantic segmentation. However, it is essentially hampered by two critical problems: 1) the day-night gap of RGB images is larger than that of thermal images, and 2) the class-wise performance of RGB images at night is not consistently higher or lower than that of thermal images. we propose the first test-time adaptation (TTA) framework, dubbed Night-TTA, to address the problems for nighttime RGBT semantic segmentation without access to the source (daytime) data during adaptation. Our method enjoys three key technical parts. Firstly, as one modality (e.g., RGB) suffers from a larger domain gap than that of the other (e.g., thermal), Imaging Heterogeneity Refinement (IHR) employs an interaction branch on the basis of RGB and thermal branches to prevent cross-modal discrepancy and performance degradation. Then, Class Aware Refinement (CAR) is introduced to obtain reliable ensemble logits based on pixel-level distribution aggregation of the three branches. In addition, we also design a specific learning scheme for our TTA framework, which enables the ensemble logits and three student logits to collaboratively learn to improve the quality of predictions during the testing phase of our Night TTA. Extensive experiments show that our method achieves state-of-the-art (SoTA) performance with a 13.07% boost in mIoU.

Related papers

Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset [65.76480665062363]
Human Activity Recognition primarily relied on traditional RGB cameras to achieve high-performance activity recognition. Challenges in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras. In this work, we rethink human activity recognition by combining the RGB and event cameras.
arXiv Detail & Related papers (2025-04-08T09:14:24Z)
Alignment-Free RGBT Salient Object Detection: Semantics-guided Asymmetric Correlation Network and A Unified Benchmark [15.435695491233982]
RGB and Thermal (RGBT) Salient Object Detection (SOD) aims to achieve high-quality saliency prediction. Existing methods are tailored for manually aligned image pairs, which are labor-intensive. We make the first attempt to address RGBT SOD for initially captured RGB and thermal image pairs without manual alignment.
arXiv Detail & Related papers (2024-06-03T01:01:58Z)
You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement [50.37253008333166]
Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. We propose a novel trainable color space, named Horizontal/Vertical-Intensity (HVI) It not only decouples brightness and color from RGB channels to mitigate the instability during enhancement but also adapts to low-light images in different illumination ranges due to the trainable parameters.
arXiv Detail & Related papers (2024-02-08T16:47:43Z)
Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation [19.41334573257174]
Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness. Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation. This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
arXiv Detail & Related papers (2023-06-17T14:28:08Z)
Complementary Random Masking for RGB-Thermal Semantic Segmentation [63.93784265195356]
RGB-thermal semantic segmentation is a potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities. We achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-30T13:57:21Z)
Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task. In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image. On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z)
Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z)
RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [49.28588927121722]
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels. To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
arXiv Detail & Related papers (2022-06-14T17:59:59Z)
Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning. Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z)
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image. Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
HeatNet: Bridging the Day-Night Domain Gap in Semantic Segmentation with Thermal Images [26.749261270690425]
Real-world driving scenarios entail adverse environmental conditions such as nighttime illumination or glare. We propose a multimodal semantic segmentation model that can be applied during daytime and nighttime. Besides RGB images, we leverage thermal images, making our network significantly more robust.
arXiv Detail & Related papers (2020-03-10T11:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.