SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for
Autonomous Driving
- URL: http://arxiv.org/abs/2103.03150v1
- Date: Thu, 4 Mar 2021 16:42:49 GMT
- Title: SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for
Autonomous Driving
- Authors: Farzeen Munir, Shoaib Azam and Moongu Jeon
- Abstract summary: We have proposed a deep neural network Self Supervised Thermal Network (SSTN) for learning the feature embedding to maximize the information between visible and infrared spectrum domain by contrastive learning.
The proposed method is extensively evaluated on the two publicly available datasets: the FLIR-ADAS dataset and the KAIST Multi-Spectral dataset.
- Score: 6.810856082577402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The sensibility and sensitivity of the environment play a decisive role in
the safe and secure operation of autonomous vehicles. This perception of the
surrounding is way similar to human visual representation. The human's brain
perceives the environment by utilizing different sensory channels and develop a
view-invariant representation model. Keeping in this context, different
exteroceptive sensors are deployed on the autonomous vehicle for perceiving the
environment. The most common exteroceptive sensors are camera, Lidar and radar
for autonomous vehicle's perception. Despite being these sensors have
illustrated their benefit in the visible spectrum domain yet in the adverse
weather conditions, for instance, at night, they have limited operation
capability, which may lead to fatal accidents. In this work, we explore thermal
object detection to model a view-invariant model representation by employing
the self-supervised contrastive learning approach. For this purpose, we have
proposed a deep neural network Self Supervised Thermal Network (SSTN) for
learning the feature embedding to maximize the information between visible and
infrared spectrum domain by contrastive learning, and later employing these
learned feature representation for the thermal object detection using
multi-scale encoder-decoder transformer network. The proposed method is
extensively evaluated on the two publicly available datasets: the FLIR-ADAS
dataset and the KAIST Multi-Spectral dataset. The experimental results
illustrate the efficacy of the proposed method.
Related papers
- OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising [49.86409475232849]
Trajectory prediction is fundamental in computer vision and autonomous driving.
Existing approaches in this field often assume precise and complete observational data.
We present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique.
arXiv Detail & Related papers (2024-04-02T18:30:29Z) - Generating Human-Centric Visual Cues for Human-Object Interaction
Detection via Large Vision-Language Models [59.611697856666304]
Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions.
We propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans.
We develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - ARTSeg: Employing Attention for Thermal images Semantic Segmentation [6.060020806741279]
We have designed an attention-based Recurrent Convolution Network (RCNN) encoder-decoder architecture named ARTSeg for thermal semantic segmentation.
The efficacy of the proposed method is evaluated on the available public dataset, showing better performance with other state-of-the-art methods in mean intersection over union (IoU)
arXiv Detail & Related papers (2021-11-30T10:17:28Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - Channel Boosting Feature Ensemble for Radar-based Object Detection [6.810856082577402]
Radar-based object detection is explored provides a counterpart sensor modality to be deployed and used in adverse weather conditions.
The proposed method's efficacy is extensively evaluated using the COCO evaluation metric.
arXiv Detail & Related papers (2021-01-10T12:20:58Z) - All-Weather Object Recognition Using Radar and Infrared Sensing [1.7513645771137178]
This thesis explores new sensing developments based on long wave polarised infrared (IR) imagery and imaging radar to recognise objects.
First, we developed a methodology based on Stokes parameters using polarised infrared data to recognise vehicles using deep neural networks.
Second, we explored the potential of using only the power spectrum captured by low-THz radar sensors to perform object recognition in a controlled scenario.
Last, we created a new large-scale dataset in the "wild" with many different weather scenarios showing radar robustness to detect vehicles in adverse weather.
arXiv Detail & Related papers (2020-10-30T14:16:39Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Exploring Thermal Images for Object Detection in Underexposure Regions
for Autonomous Driving [67.69430435482127]
Underexposure regions are vital to construct a complete perception of the surroundings for safe autonomous driving.
The availability of thermal cameras has provided an essential alternate to explore regions where other optical sensors lack in capturing interpretable signals.
This work proposes a domain adaptation framework which employs a style transfer technique for transfer learning from visible spectrum images to thermal images.
arXiv Detail & Related papers (2020-06-01T09:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.