Related papers: Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework

Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework

URL: http://arxiv.org/abs/2602.13606v1
Date: Sat, 14 Feb 2026 05:12:06 GMT
Title: Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework
Authors: Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang,
Abstract summary: We present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads.<n>In this framework, we first extract the representative features from the sensing modalities by modality specific encoders, then, utilize multi-head cross-modal attention to learn dependencies and correlations between different modalities.<n>The proposed framework achieves up to 96.72% accuracy on predicting top-15 beams correctly, (ii) incurs roughly 0.77 dB average power loss, and (iii) improves the overall latency and beam searching space overheads by 86.81% and 76.56% respectively.
Score: 1.7834756213254652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Millimeter wave (mmWave) communication, utilizing beamforming techniques to address the inherent path loss limitation, is considered as one of the key technologies to support ever increasing high throughput and low latency demands of connected vehicles. However, adopting standard defined beamforming approach in highly dynamic vehicular environments often incurs high beam training overheads and reduction in the available airtime for communications, which is mainly due to exchanging pilot signals and exhaustive beam measurements. To this end, we present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads. In this framework, we first extract the representative features from the sensing modalities by modality specific encoders, then, utilize multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established. To show the generalizability of the proposed framework, we perform a comprehensive experiment in four different vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) scenarios from real world multimodal and 60 GHz mmWave wireless sensing data. The experiment reveals that the proposed framework (i) achieves up to 96.72% accuracy on predicting top-15 beams correctly, (ii) incurs roughly 0.77 dB average power loss, and (iii) improves the overall latency and beam searching space overheads by 86.81% and 76.56% respectively for top-15 beams compared to standard defined approach.

Related papers

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles [74.8162337823142]
MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
arXiv Detail & Related papers (2025-11-23T08:42:17Z)
Multi-Modal Sensing Aided mmWave Beamforming for V2V Communications with Transformers [1.9483189922830135]
We present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads.<n>In this framework, we first extract the features individually from the visual and GPS coordinates sensing modalities by modality specific encoders.<n>We then fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established.
arXiv Detail & Related papers (2025-09-14T06:03:42Z)
Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning [2.2879063461015425]
This paper presents a deep learning-based solution for utilizing the multi-modality sensing data for predicting optimal beams having sufficient mmWave received powers.<n>The results show that it can achieve up to 98.19% accuracies while predicting top-13 beams.
arXiv Detail & Related papers (2025-04-08T16:18:00Z)
Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework [57.994965436344195]
Beamforming is a key technology in millimeter-wave (mmWave) communications that improves signal transmission by optimizing directionality and intensity.<n> multimodal sensing-aided beam prediction has gained significant attention, using various sensing data to predict user locations or network conditions.<n>Despite its promising potential, the adoption of multimodal sensing-aided beam prediction is hindered by high computational complexity, high costs, and limited datasets.
arXiv Detail & Related papers (2025-04-07T15:38:25Z)
Position Aware 60 GHz mmWave Beamforming for V2V Communications Utilizing Deep Learning [2.4993733210446893]
This paper presents a deep learning-based solution on utilizing the vehicular position information for predicting the optimal beams having sufficient mmWave received powers. The results show that the solution can achieve up to 84.58% of received power of link status on average.
arXiv Detail & Related papers (2024-02-02T09:30:27Z)
Reliable Beamforming at Terahertz Bands: Are Causal Representations the Way Forward? [85.06664206117088]
Multi-user wireless systems can meet metaverse requirements by utilizing terahertz bandwidth with massive number of antennas. Existing solutions lack proper modeling of channel dynamics, resulting in inaccurate beamforming solutions in high-mobility scenarios. Herein, a dynamic, semantically aware beamforming solution is proposed for the first time, utilizing novel artificial intelligence algorithms in variational causal inference.
arXiv Detail & Related papers (2023-03-14T16:02:46Z)
Fast Beam Alignment via Pure Exploration in Multi-armed Bandits [91.11360914335384]
We develop a bandit-based fast BA algorithm to reduce BA latency for millimeter-wave (mmWave) communications. Our algorithm is named Two-Phase Heteroscedastic Track-and-Stop (2PHT&S)
arXiv Detail & Related papers (2022-10-23T05:57:39Z)
Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things [82.15959827765325]
We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) We address two major shortcomings of standard multimodal approaches, limited area coverage and reduced reliability. Our new framework fuses the concept of modality hallucination with triplet learning to train a model with different modalities to handle missing sensors at inference time.
arXiv Detail & Related papers (2022-07-14T10:04:18Z)
Deep Learning on Multimodal Sensor Data at the Wireless Edge for Vehicular Network [8.458980329342799]
We propose a novel expediting beam selection by leveraging multimodal data collected from sensors like LiDAR, camera images, and GPS. We propose individual and distributed fusion-based deep learning (F-DL) architectures that can execute locally as well as at a mobile edge computing center. Results from extensive evaluations conducted on publicly available synthetic and home-grown real-world datasets reveal 95% and 96% improvement in beam selection speed over classical RF-only beam sweeping.
arXiv Detail & Related papers (2022-01-12T21:55:34Z)
A Novel Look at LIDAR-aided Data-driven mmWave Beam Selection [24.711393214172148]
We propose a lightweight neural network (NN) architecture along with the corresponding LIDAR preprocessing. Our NN-based beam selection scheme can achieve 79.9% throughput without any beam search overhead and 95% by searching among as few as 6 beams.
arXiv Detail & Related papers (2021-04-29T18:07:31Z)
Terahertz-Band Joint Ultra-Massive MIMO Radar-Communications: Model-Based and Model-Free Hybrid Beamforming [45.257328085051974]
Wireless communications and sensing at terahertz (THz) band are investigated as promising short-range technologies. Ultra-massive multiple-input multiple-output (UM-MIMO) antenna systems have been proposed for THz communications to compensate propagation losses. We develop THz hybrid beamformers based on both model-based and model-free techniques for a new group-of-subarrays (GoSA) UM-MIMO structure.
arXiv Detail & Related papers (2021-02-27T21:28:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.