TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation
- URL: http://arxiv.org/abs/2508.08038v2
- Date: Mon, 18 Aug 2025 12:37:28 GMT
- Title: TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation
- Authors: Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille,
- Abstract summary: TRIDE is a radar-camera fusion algorithm that enhances text feature extraction by incorporating radar point information.<n>Our method, benchmarked on the nuScenes dataset, demonstrates performance gains over the state-of-the-art.
- Score: 7.90238039959534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation, essential for autonomous driving, seeks to interpret the 3D environment surrounding vehicles. The development of radar sensors, known for their cost-efficiency and robustness, has spurred interest in radar-camera fusion-based solutions. However, existing algorithms fuse features from these modalities without accounting for weather conditions, despite radars being known to be more robust than cameras under adverse weather. Additionally, while Vision-Language models have seen rapid advancement, utilizing language descriptions alongside other modalities for depth estimation remains an open challenge. This paper first introduces a text-generation strategy along with feature extraction and fusion techniques that can assist monocular depth estimation pipelines, leading to improved accuracy across different algorithms on the KITTI dataset. Building on this, we propose TRIDE, a radar-camera fusion algorithm that enhances text feature extraction by incorporating radar point information. To address the impact of weather on sensor performance, we introduce a weather-aware fusion block that adaptively adjusts radar weighting based on current weather conditions. Our method, benchmarked on the nuScenes dataset, demonstrates performance gains over the state-of-the-art, achieving a 12.87% improvement in MAE and a 9.08% improvement in RMSE. Code: https://github.com/harborsarah/TRIDE
Related papers
- TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion.<n>Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed.<n>Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z) - RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z) - TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection [6.163747364795787]
We present TransRAD, a novel 3D radar object detection model.<n>We propose Location-Aware NMS to mitigate the common issue of duplicate bounding boxes in deep radar object detection.<n>Results demonstrate that TransRAD outperforms state-of-the-art methods in both 2D and 3D radar detection tasks.
arXiv Detail & Related papers (2025-01-29T20:21:41Z) - GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling [7.90238039959534]
Existing algorithms process radar data by projecting 3D points onto the image plane for pixel-level feature extraction.
We propose GET-UP, leveraging attention-enhanced Graph Neural Networks (GNN) to exchange and aggregate both 2D and 3D information from radar data.
We benchmark our proposed GET-UP on the nuScenes dataset, achieving state-of-the-art performance with a 15.3% and 14.7% improvement in MAE and RMSE over the previously best-performing model.
arXiv Detail & Related papers (2024-09-02T14:15:09Z) - CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation [6.9404362058736995]
This paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation.
The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements.
For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively.
arXiv Detail & Related papers (2024-06-30T13:39:29Z) - Radar-Lidar Fusion for Object Detection by Designing Effective
Convolution Networks [18.17057711053028]
We propose a dual-branch framework to integrate radar and Lidar data for enhanced object detection.
The results show that it surpasses state-of-the-art methods by $1.89%$ and $2.61%$ in favorable and adverse weather conditions.
arXiv Detail & Related papers (2023-10-30T10:18:40Z) - Echoes Beyond Points: Unleashing the Power of Raw Radar Data in
Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline.
Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z) - RadarFormer: Lightweight and Accurate Real-Time Radar Object Detection
Model [13.214257841152033]
Radar-centric data sets do not get a lot of attention in the development of deep learning techniques for radar perception.
We propose a transformers-based model, named RadarFormer, that utilizes state-of-the-art developments in vision deep learning.
Our model also introduces a channel-chirp-time merging module that reduces the size and complexity of our models by more than 10 times without compromising accuracy.
arXiv Detail & Related papers (2023-04-17T17:07:35Z) - R4Dyn: Exploring Radar for Self-Supervised Monocular Depth Estimation of
Dynamic Scenes [69.6715406227469]
Self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches.
We present R4Dyn, a novel set of techniques to use cost-efficient radar data on top of a self-supervised depth estimation framework.
arXiv Detail & Related papers (2021-08-10T17:57:03Z) - LiRaNet: End-to-End Trajectory Prediction using Spatio-Temporal Radar
Fusion [52.59664614744447]
We present LiRaNet, a novel end-to-end trajectory prediction method which utilizes radar sensor information along with widely used lidar and high definition (HD) maps.
automotive radar provides rich, complementary information, allowing for longer range vehicle detection as well as instantaneous velocity measurements.
arXiv Detail & Related papers (2020-10-02T00:13:00Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.