A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
- URL: http://arxiv.org/abs/2505.11066v1
- Date: Fri, 16 May 2025 10:02:22 GMT
- Title: A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware
- Authors: Rui Wang, Shichun Yang, Yuyi Chen, Zhuoyang Li, Zexiang Tong, Jianyi Xu, Jiayi Lu, Xinjie Feng, Yaoguang Cao,
- Abstract summary: Road terrains play a crucial role in ensuring the driving safety of autonomous vehicles (AVs)<n>Existing sensors of AVs, including cameras and Lidars, are susceptible to variations in lighting and weather conditions.<n>We propose an illumination-aware multi-modal fusion network (IMF), which leverages both exteroceptive and proprioceptive perception.
- Score: 4.964908292792731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Road terrains play a crucial role in ensuring the driving safety of autonomous vehicles (AVs). However, existing sensors of AVs, including cameras and Lidars, are susceptible to variations in lighting and weather conditions, making it challenging to achieve real-time perception of road conditions. In this paper, we propose an illumination-aware multi-modal fusion network (IMF), which leverages both exteroceptive and proprioceptive perception and optimizes the fusion process based on illumination features. We introduce an illumination-perception sub-network to accurately estimate illumination features. Moreover, we design a multi-modal fusion network which is able to dynamically adjust weights of different modalities according to illumination features. We enhance the optimization process by pre-training of the illumination-perception sub-network and incorporating illumination loss as one of the training constraints. Extensive experiments demonstrate that the IMF shows a superior performance compared to state-of-the-art methods. The comparison results with single modality perception methods highlight the comprehensive advantages of multi-modal fusion in accurately perceiving road terrains under varying lighting conditions. Our dataset is available at: https://github.com/lindawang2016/IMF.
Related papers
- Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement [41.66776033752888]
Most low-light image enhancement methods rely on pre-trained model priors, low-light inputs, or both.<n>We propose VLM-IMI, a novel framework that leverages large vision-language models with iterative and manual instructions.<n>VLM-IMI incorporates textual descriptions of the desired normal-light content as enhancement cues, enabling semantically informed restoration.
arXiv Detail & Related papers (2025-07-24T03:35:20Z) - SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement [58.79901582809091]
Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>We present a Spatially-Adaptive Illumination-Guided Transformer framework that enables accurate illumination restoration.
arXiv Detail & Related papers (2025-07-21T11:38:56Z) - Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z) - ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement [10.957431540794836]
Inadequate illumination can lead to significant information loss and poor image quality, impacting various applications such as surveillance.<n>Current enhancement techniques often use specific datasets to enhance low-light images, but still present challenges when adapting to diverse real-world conditions.<n>The Adaptive Light Enhancement Network (ALEN) is introduced, whose main approach is the use of a classification mechanism to determine whether local or global illumination enhancement is required.
arXiv Detail & Related papers (2024-07-29T05:19:23Z) - Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving [45.97279394690308]
LightDiff is a framework designed to enhance the low-light image quality for autonomous driving applications.
It incorporates a novel multi-condition adapter that adaptively controls the input weights from different modalities, including depth maps, RGB images, and text captions.
It can significantly improve the performance of several state-of-the-art 3D detectors in night-time conditions while achieving high visual quality scores.
arXiv Detail & Related papers (2024-04-07T04:10:06Z) - Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and
Visible Images [49.75771095302775]
We propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images.
First, we separately fuse spatial and semantic features from infrared and visible images, where the former are used for the adjustment of light distribution.
Second, we utilize detection features extracted by a pre-trained backbone that guide the fusion of semantic features.
Third, we propose a new illumination loss to constrain fusion image with normal light intensity.
arXiv Detail & Related papers (2024-03-02T03:52:07Z) - NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field
Indirect Illumination [48.42173911185454]
Inverse rendering methods aim to estimate geometry, materials and illumination from multi-view RGB images.
We propose an end-to-end inverse rendering pipeline that decomposes materials and illumination from multi-view images.
arXiv Detail & Related papers (2023-03-29T12:05:19Z) - Sparse Needlets for Lighting Estimation with Spherical Transport Loss [89.52531416604774]
NeedleLight is a new lighting estimation model that represents illumination with needlets and allows lighting estimation in both frequency domain and spatial domain jointly.
Extensive experiments show that NeedleLight achieves superior lighting estimation consistently across multiple evaluation metrics as compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-24T15:19:42Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.