Scaling Foundation Models for Radar Scene Understanding
- URL: http://arxiv.org/abs/2511.21105v1
- Date: Wed, 26 Nov 2025 06:41:00 GMT
- Title: Scaling Foundation Models for Radar Scene Understanding
- Authors: Pushkal Mishra, Kshitiz Bansal, Dinesh Bharadia,
- Abstract summary: Radar sensors provide reliable perception across adverse weather, lighting, and long-range conditions.<n>Recent advances in foundation models have transformed visual and language understanding, yet their integration with radar sensing remains largely underexplored.<n>We introduce RadarFM: a radar foundation model that learns unified scene-level representations through structured spatial language supervision.
- Score: 8.23171791313388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radar sensors provide reliable perception across adverse weather, lighting, and long-range conditions. Recent advances in foundation models have transformed visual and language understanding, yet their integration with radar sensing remains largely underexplored. Existing radar approaches are fragmented and task-specific; each downstream task employs distinct architectures and training objectives, preventing transfer across tasks. In this work, we introduce RadarFM: a radar foundation model that learns unified scene-level representations through structured spatial language supervision. We make two key contributions: (1) a structured caption framework that encodes vehicle distributions in native radar coordinates, and (2) a hash-aware contrastive learning objective that quantifies continuous scene similarity rather than binary matching, enabling fine-grained spatial reasoning. Leveraging the CARLA simulator, we generate large-scale, well-annotated radar datasets across diverse driving scenarios. We also propose localization-aware metrics that assess spatial accuracy beyond traditional detection measures.
Related papers
- RadarGen: Automotive Radar Point Cloud Generation from Cameras [64.69976771710057]
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery.<n>RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form.<n>We show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data.
arXiv Detail & Related papers (2025-12-19T18:57:33Z) - Radar Tracker: Moving Instance Tracking in Sparse and Noisy Radar Point Clouds [25.36192517603375]
We address moving instance tracking in sparse radar point clouds to enhance scene interpretation.<n>We propose a learning-based radar tracker incorporating temporal offset predictions to enable direct center-based association.<n>Our approach shows an improved performance on the moving instance tracking benchmark of the RadarScenes dataset.
arXiv Detail & Related papers (2025-07-04T09:57:28Z) - TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion.<n>Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed.<n>Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z) - RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence [10.115852646162843]
We present Radar-LLM, the first framework that leverages large language models (LLMs) for human understanding using millimeter-wave radar as the sensing modality.<n>To address data scarcity, we introduce a physics-aware pipeline synthesis that generates realistic radar-text pairs from motion-text datasets.<n>Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions.
arXiv Detail & Related papers (2025-04-14T04:18:25Z) - Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - RaLF: Flow-based Global and Metric Radar Localization in LiDAR Maps [8.625083692154414]
We propose RaLF, a novel deep neural network-based approach for localizing radar scans in a LiDAR map of the environment.
RaLF is composed of radar and LiDAR feature encoders, a place recognition head that generates global descriptors, and a metric localization head that predicts the 3-DoF transformation between the radar scan and the map.
We extensively evaluate our approach on multiple real-world driving datasets and show that RaLF achieves state-of-the-art performance for both place recognition and metric localization.
arXiv Detail & Related papers (2023-09-18T15:37:01Z) - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - Semantic Segmentation of Radar Detections using Convolutions on Point
Clouds [59.45414406974091]
We introduce a deep-learning based method to convolve radar detections into point clouds.
We adapt this algorithm to radar-specific properties through distance-dependent clustering and pre-processing of input point clouds.
Our network outperforms state-of-the-art approaches that are based on PointNet++ on the task of semantic segmentation of radar point clouds.
arXiv Detail & Related papers (2023-05-22T07:09:35Z) - Large-Scale Topological Radar Localization Using Learned Descriptors [15.662820454886202]
We present a simple yet efficient deep network architecture to compute a rotationally invariant discriminative global descriptor from a radar scan image.
The performance and generalization ability of the proposed method is experimentally evaluated on two large scale driving datasets.
arXiv Detail & Related papers (2021-10-06T21:57:23Z) - Multi-View Radar Semantic Segmentation [3.2093811507874768]
Automotive radars are low-cost active sensors that measure properties of surrounding objects.
They are seldom used for scene understanding due to the size and complexity of radar raw data.
We propose several novel architectures, and their associated losses, which analyse multiple "views" of the range-angle-Doppler radar tensor to segment it semantically.
arXiv Detail & Related papers (2021-03-30T09:56:41Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.