physfusion: A Transformer-based Dual-Stream Radar and Vision Fusion Framework for Open Water Surface Object Detection
- URL: http://arxiv.org/abs/2603.01947v1
- Date: Mon, 02 Mar 2026 15:00:22 GMT
- Title: physfusion: A Transformer-based Dual-Stream Radar and Vision Fusion Framework for Open Water Surface Object Detection
- Authors: Yuting Wan, Liguo Sun, Jiuwu Hao, Zao Zhang, Pin LV,
- Abstract summary: PhysFusion is a physics-informed radar-image detection framework for water-surface perception.<n>It integrates a Physics-In Radar (PIR) with an RCS Mapper and Quality Gate, transforming per-point radar attributes into compact scattering priors.<n>Experiments on WaterScenes and FLOW demonstrate that PhysFusion achieves 59.7% mAP50:95 and 90.3% mAP50 on WaterScenes.
- Score: 3.3795150675738808
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Detecting water-surface targets for Unmanned Surface Vehicles (USVs) is challenging due to wave clutter, specular reflections, and weak appearance cues in long-range observations. Although 4D millimeter-wave radar complements cameras under degraded illumination, maritime radar point clouds are sparse and intermittent, with reflectivity attributes exhibiting heavy-tailed variations under scattering and multipath, making conventional fusion designs struggle to exploit radar cues effectively. We propose PhysFusion, a physics-informed radar-image detection framework for water-surface perception. The framework integrates: (1) a Physics-Informed Radar Encoder (PIR Encoder) with an RCS Mapper and Quality Gate, transforming per-point radar attributes into compact scattering priors and predicting point-wise reliability for robust feature learning under clutter; (2) a Radar-guided Interactive Fusion Module (RIFM) performing query-level radar-image fusion between semantically enriched radar features and multi-scale visual features, with the radar branch modeled by a dual-stream backbone including a point-based local stream and a transformer-based global stream using Scattering-Aware Self-Attention (SASA); and (3) a Temporal Query Aggregation module (TQA) aggregating frame-wise fused queries over a short temporal window for temporally consistent representations. Experiments on WaterScenes and FLOW demonstrate that PhysFusion achieves 59.7% mAP50:95 and 90.3% mAP50 on WaterScenes (T=5 radar history) using 5.6M parameters and 12.5G FLOPs, and reaches 94.8% mAP50 and 46.2% mAP50:95 on FLOW under radar+camera setting. Ablation studies quantify the contributions of PIR Encoder, SASA-based global reasoning, and RIFM.
Related papers
- HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds [7.899148878601621]
We present HyperDet, a detector-agnostic radar-only 3D detection framework.<n>It constructs a task-aware hyper 4D radar point cloud for standard LiDAR-oriented detectors.<n>On MAN TruckScenes, HyperDet consistently improves over raw radar inputs with VoxelNeXt and CenterPoint.
arXiv Detail & Related papers (2026-02-12T04:21:58Z) - Wavelet-based Multi-View Fusion of 4D Radar Tensor and Camera for Robust 3D Object Detection [44.78575994732947]
WRCFormer is a novel 3D object detection framework that fuses raw radar cubes with camera inputs via multi-view representations of the decoupled radar cube.<n>WRCFormer achieves state-of-the-art performance on the K-Radar benchmarks, surpassing the best model by approximately 2.4% in all scenarios.
arXiv Detail & Related papers (2025-12-28T15:32:17Z) - RadarGen: Automotive Radar Point Cloud Generation from Cameras [64.69976771710057]
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery.<n>RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form.<n>We show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data.
arXiv Detail & Related papers (2025-12-19T18:57:33Z) - RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z) - V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection [64.93675471780209]
We present V2X-R, the first simulated V2X dataset incorporating LiDAR, camera, and 4D radar.<n>V2X-R contains 12,079 scenarios with 37,727 frames of LiDAR and 4D radar point clouds, 150,908 images, and 170,859 annotated 3D vehicle bounding boxes.<n>We propose a novel cooperative LiDAR-4D radar fusion pipeline for 3D object detection and implement it with various fusion strategies.
arXiv Detail & Related papers (2024-11-13T07:41:47Z) - Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data [8.552647576661174]
millimeter-wave radar sensor maintains stable performance under adverse environmental conditions.
Radar point clouds are relatively sparse and contain massive ghost points.
We propose a novel point cloud super-resolution approach for 3D mmWave radar data, named Radar-diffusion.
arXiv Detail & Related papers (2024-04-09T04:41:05Z) - Radarize: Enhancing Radar SLAM with Generalizable Doppler-Based Odometry [9.420543997290126]
Radarize is a self-contained SLAM pipeline that uses only a commodity single-chip mmWave radar.
Our method outperforms state-of-the-art radar and radar-inertial approaches by approximately 5x in terms of odometry and 8x in terms of end-to-end SLAM.
arXiv Detail & Related papers (2023-11-19T07:47:11Z) - MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and
Camera Fusion [6.639648061168067]
Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving.
Current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data.
We present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features.
arXiv Detail & Related papers (2023-02-21T08:25:50Z) - LiRaNet: End-to-End Trajectory Prediction using Spatio-Temporal Radar
Fusion [52.59664614744447]
We present LiRaNet, a novel end-to-end trajectory prediction method which utilizes radar sensor information along with widely used lidar and high definition (HD) maps.
automotive radar provides rich, complementary information, allowing for longer range vehicle detection as well as instantaneous velocity measurements.
arXiv Detail & Related papers (2020-10-02T00:13:00Z) - RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars.
We propose a new solution that exploits both LiDAR and Radar sensors for perception.
Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.