Related papers: Efficient UAV trajectory prediction: A multi-modal deep diffusion framework

Efficient UAV trajectory prediction: A multi-modal deep diffusion framework

URL: http://arxiv.org/abs/2602.00107v1
Date: Mon, 26 Jan 2026 13:14:52 GMT
Title: Efficient UAV trajectory prediction: A multi-modal deep diffusion framework
Authors: Yuan Gao, Xinyu Guo, Wenjing Xie, Zifan Wang, Hongwen Yu, Gongyang Li, Shugong Xu,
Abstract summary: A multi-modal UAV trajectory prediction method based on the fusion of LiDAR and millimeter-wave radar information is proposed.<n>The proposed model can effectively utilize multi-modal data and provides an efficient solution for unauthorized UAV trajectory prediction in the low-altitude economy.
Score: 26.678930486634602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To meet the requirements for managing unauthorized UAVs in the low-altitude economy, a multi-modal UAV trajectory prediction method based on the fusion of LiDAR and millimeter-wave radar information is proposed. A deep fusion network for multi-modal UAV trajectory prediction, termed the Multi-Modal Deep Fusion Framework, is designed. The overall architecture consists of two modality-specific feature extraction networks and a bidirectional cross-attention fusion module, aiming to fully exploit the complementary information of LiDAR and radar point clouds in spatial geometric structure and dynamic reflection characteristics. In the feature extraction stage, the model employs independent but structurally identical feature encoders for LiDAR and radar. After feature extraction, the model enters the Bidirectional Cross-Attention Mechanism stage to achieve information complementarity and semantic alignment between the two modalities. To verify the effectiveness of the proposed model, the MMAUD dataset used in the CVPR 2024 UG2+ UAV Tracking and Pose-Estimation Challenge is adopted as the training and testing dataset. Experimental results show that the proposed multi-modal fusion model significantly improves trajectory prediction accuracy, achieving a 40% improvement compared to the baseline model. In addition, ablation experiments are conducted to demonstrate the effectiveness of different loss functions and post-processing strategies in improving model performance. The proposed model can effectively utilize multi-modal data and provides an efficient solution for unauthorized UAV trajectory prediction in the low-altitude economy.

Related papers

DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving [65.7087560656003]
Generative diffusion models for end-to-end autonomous driving often suffer from mode collapse.<n>We propose DiffusionDriveV2, which leverages reinforcement learning to constrain low-quality modes and explore for superior trajectories.<n>This significantly enhances the overall output quality while preserving the inherent multimodality of its core Gaussian Mixture Model.
arXiv Detail & Related papers (2025-12-08T17:29:52Z)
A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles [74.8162337823142]
MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
arXiv Detail & Related papers (2025-11-23T08:42:17Z)
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model [62.889356203346985]
We propose DUal-STream diffusion (DUST), a world-model augmented VLA framework that handles the modality conflict.<n>DUST achieves up to 6% gains over a standard VLA baseline and implicit world-modeling methods.<n>On real-world tasks with the Franka Research 3, DUST outperforms baselines in success rate by 13%.
arXiv Detail & Related papers (2025-10-31T16:32:12Z)
Collaborative-Distilled Diffusion Models (CDDM) for Accelerated and Lightweight Trajectory Prediction [14.108460337857645]
Trajectory prediction is a fundamental task in Autonomous Vehicles (AVs) and Intelligent Transportation Systems (ITS)<n> Diffusion models have recently demonstrated strong performance in probabilistic trajectory prediction.<n>This paper proposes Collaborative-Distilled Diffusion Models (CDDM), a novel method for real-time and lightweight trajectory prediction.
arXiv Detail & Related papers (2025-10-01T08:00:31Z)
CoPAD : Multi-source Trajectory Fusion and Cooperative Trajectory Prediction with Anchor-oriented Decoder in V2X Scenarios [13.568599065039459]
CoPAD is a lightweight framework for cooperative trajectory prediction.<n>It effectively performs early fusion on multi-source trajectory data from vehicles and road infrastructure.<n>Experiments show that CoPAD achieves the state-of-the-art performance on the DAIR-V2X-Seq dataset.
arXiv Detail & Related papers (2025-09-19T13:50:49Z)
DGE-YOLO: Dual-Branch Gathering and Attention for Accurate UAV Object Detection [0.46040036610482665]
We present DGE-YOLO, an enhanced YOLO-based detection framework designed to effectively fuse multi-modal information.<n> Specifically, we introduce a dual-branch architecture for modality-specific feature extraction, enabling the model to process both infrared and visible images.<n>To further enrich semantic representation, we propose an Efficient Multi-scale Attention (EMA) mechanism that enhances feature learning across spatial scales.
arXiv Detail & Related papers (2025-06-29T14:19:18Z)
Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework [57.994965436344195]
Beamforming is a key technology in millimeter-wave (mmWave) communications that improves signal transmission by optimizing directionality and intensity.<n> multimodal sensing-aided beam prediction has gained significant attention, using various sensing data to predict user locations or network conditions.<n>Despite its promising potential, the adoption of multimodal sensing-aided beam prediction is hindered by high computational complexity, high costs, and limited datasets.
arXiv Detail & Related papers (2025-04-07T15:38:25Z)
DA-STGCN: 4D Trajectory Prediction Based on Spatiotemporal Feature Extraction [6.781509470656284]
We propose DA-STGCN, an innovative graph convolutional network that integrates a dual attention mechanism.<n>Our model reconstructs adjacency matrix through a self-attention approach, enhancing the capture of node correlations.<n>Results demonstrate a notable improvement over current 4D trajectory prediction methods, achieving a 20% and 30% reduction in the Average Displacement Error (Attention) and Final Displacement Error (FDE)
arXiv Detail & Related papers (2025-03-05T03:42:49Z)
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies. We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds. We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
Distributed Conditional Generative Adversarial Networks (GANs) for Data-Driven Millimeter Wave Communications in UAV Networks [116.94802388688653]
A novel framework is proposed to perform data-driven air-to-ground (A2G) channel estimation for millimeter wave (mmWave) communications in an unmanned aerial vehicle (UAV) wireless network. An effective channel estimation approach is developed, allowing each UAV to train a stand-alone channel model via a conditional generative adversarial network (CGAN) along each beamforming direction. A cooperative framework, based on a distributed CGAN architecture, is developed, allowing each UAV to collaboratively learn the mmWave channel distribution.
arXiv Detail & Related papers (2021-02-02T20:56:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.