RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence
- URL: http://arxiv.org/abs/2504.09862v1
- Date: Mon, 14 Apr 2025 04:18:25 GMT
- Title: RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence
- Authors: Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Lizhou Lin, Lan Sun, Renwen Wang, Jianran Liu, Qi Wu, Ling Pei,
- Abstract summary: We present Radar-LLM, the first framework that leverages large language models (LLMs) for human understanding using millimeter-wave radar as the sensing modality.<n>To address data scarcity, we introduce a physics-aware pipeline synthesis that generates realistic radar-text pairs from motion-text datasets.<n>Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions.
- Score: 10.115852646162843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Millimeter-wave radar provides a privacy-preserving solution for human motion analysis, yet its sparse point clouds pose significant challenges for semantic understanding. We present Radar-LLM, the first framework that leverages large language models (LLMs) for human motion understanding using millimeter-wave radar as the sensing modality. Our approach introduces two key innovations: (1) a motion-guided radar tokenizer based on our Aggregate VQ-VAE architecture that incorporates deformable body templates and masked trajectory modeling to encode spatiotemporal point clouds into compact semantic tokens, and (2) a radar-aware language model that establishes cross-modal alignment between radar and text in a shared embedding space. To address data scarcity, we introduce a physics-aware synthesis pipeline that generates realistic radar-text pairs from motion-text datasets. Extensive experiments demonstrate that Radar-LLM achieves state-of-the-art performance across both synthetic and real-world benchmarks, enabling accurate translation of millimeter-wave signals to natural language descriptions. This breakthrough facilitates comprehensive motion understanding in privacy-sensitive applications like healthcare and smart homes. We will release the full implementation to support further research on https://inowlzy.github.io/RadarLLM/.
Related papers
- TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion.
Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed.
Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z) - Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving [25.28104119280405]
We propose TPCNet, the first outdoor 3D visual grounding model upon the paradigm of prompt-guided point cloud sensor combination.
To balance the features of these two sensors required by the prompt, we have designed a multi-fusion paradigm called Two-Stage Heterogeneous Modal Adaptive Fusion.
Our experiments have demonstrated that TPCNet achieves the state-of-the-art performance on both the Talk2Radar and Talk2Car datasets.
arXiv Detail & Related papers (2025-03-11T11:48:27Z) - Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension [21.598751853520834]
4D millimeter-wave radars provide denser point clouds than conventional radars and perceive both semantic and physical characteristics of objects.<n>To foster the development of natural language-driven context understanding in radar scenes for 3D visual grounding, we construct the first dataset, Talk2Radar.<n>We propose a novel model, T-RadarNet, for 3D Referring Expression on point clouds, achieving State-Of-The-Art (SOTA) performance on the Talk2Radar dataset.
arXiv Detail & Related papers (2024-05-21T14:26:36Z) - Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - Diffusion Models for Interferometric Satellite Aperture Radar [73.01013149014865]
Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models.
Here, we leverage PDMs to generate several radar-based satellite image datasets.
We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue.
arXiv Detail & Related papers (2023-08-31T16:26:17Z) - Echoes Beyond Points: Unleashing the Power of Raw Radar Data in
Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline.
Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z) - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - Deep Radar Inverse Sensor Models for Dynamic Occupancy Grid Maps [0.0]
We propose a deep learning-based Inverse Sensor Model (ISM) to learn the mapping from sparse radar detections to polar measurement grids.
Our approach is the first one to learn a single-frame measurement grid in the polar scheme from radars with a limited Field Of View.
This enables us to flexibly use one or more radar sensors without network retraining and without requirements on 360deg sensor coverage.
arXiv Detail & Related papers (2023-05-21T09:09:23Z) - RadarFormer: Lightweight and Accurate Real-Time Radar Object Detection
Model [13.214257841152033]
Radar-centric data sets do not get a lot of attention in the development of deep learning techniques for radar perception.
We propose a transformers-based model, named RadarFormer, that utilizes state-of-the-art developments in vision deep learning.
Our model also introduces a channel-chirp-time merging module that reduces the size and complexity of our models by more than 10 times without compromising accuracy.
arXiv Detail & Related papers (2023-04-17T17:07:35Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.