BeamLLM: Vision-Empowered mmWave Beam Prediction with Large Language Models
- URL: http://arxiv.org/abs/2503.10432v1
- Date: Thu, 13 Mar 2025 14:55:59 GMT
- Title: BeamLLM: Vision-Empowered mmWave Beam Prediction with Large Language Models
- Authors: Can Zheng, Jiguang He, Guofa Cai, Zitong Yu, Chung G. Kang,
- Abstract summary: BeamLLM is a vision-aided millimeter-wave (mmWave) beam prediction framework leveraging large language models (LLMs)<n> Evaluated on a realistic vehicle-to-infrastructure (V2I) scenario, the proposed method achieves 61.01% top-1 accuracy and 97.39% top-3 accuracy in standard prediction tasks.<n>In few-shot prediction scenarios, the performance degradation is limited to 12.56% (top-1) and 5.55% (top-3) from time sample 1 to 10, demonstrating superior prediction capability.
- Score: 22.11810939970069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose BeamLLM, a vision-aided millimeter-wave (mmWave) beam prediction framework leveraging large language models (LLMs) to address the challenges of high training overhead and latency in mmWave communication systems. By combining computer vision (CV) with LLMs' cross-modal reasoning capabilities, the framework extracts user equipment (UE) positional features from RGB images and aligns visual-temporal features with LLMs' semantic space through reprogramming techniques. Evaluated on a realistic vehicle-to-infrastructure (V2I) scenario, the proposed method achieves 61.01% top-1 accuracy and 97.39% top-3 accuracy in standard prediction tasks, significantly outperforming traditional deep learning models. In few-shot prediction scenarios, the performance degradation is limited to 12.56% (top-1) and 5.55% (top-3) from time sample 1 to 10, demonstrating superior prediction capability.
Related papers
- Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection.
Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv Detail & Related papers (2025-04-02T20:33:27Z) - Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs [11.1036247482657]
We present SMETimes, the first systematic investigation of sub-3B parameter SLMs for efficient and accurate time series forecasting.<n>Our approach centers on three key innovations: A statistically-enhanced prompting mechanism that bridges numerical time series with textual semantics through statistical features; A adaptive fusion embedding architecture that aligns temporal patterns with language model token spaces through learnable parameters.
arXiv Detail & Related papers (2025-03-05T15:27:36Z) - Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop [63.34626300024294]
TimeXL is a multi-modal prediction framework that integrates a prototype-based time series encoder.
It produces more accurate predictions and interpretable explanations.
Empirical evaluations on four real-world datasets demonstrate that TimeXL achieves up to 8.9% improvement in AUC.
arXiv Detail & Related papers (2025-03-02T20:40:53Z) - VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models [63.27511432647797]
We propose VLsI: Verbalized Layers-to-Interactions, a new VLM family in 2B and 7B model sizes.
We validate VLsI across ten challenging vision-language benchmarks, achieving notable performance gains (11.0% for 2B and 17.4% for 7B) over GPT-4V.
arXiv Detail & Related papers (2024-12-02T18:58:25Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Beam Prediction based on Large Language Models [51.45077318268427]
We formulate the millimeter wave (mmWave) beam prediction problem as a time series forecasting task.<n>We transform historical observations into text-based representations using a trainable tokenizer.<n>Our method harnesses the power of LLMs to predict future optimal beams.
arXiv Detail & Related papers (2024-08-16T12:40:01Z) - An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features.
In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm.
Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z) - Camera Based mmWave Beam Prediction: Towards Multi-Candidate Real-World
Scenarios [15.287380309115399]
This paper extensively investigates the sensing-aided beam prediction problem in a real-world vehicle-to-infrastructure (V2I) scenario.
In particular, this paper proposes to utilize visual and positional data to predict the optimal beam indices.
The proposed solutions are evaluated on the large-scale real-world DeepSense $6$G dataset.
arXiv Detail & Related papers (2023-08-14T00:15:01Z) - Interpretable AI-based Large-scale 3D Pathloss Prediction Model for
enabling Emerging Self-Driving Networks [3.710841042000923]
We propose a Machine Learning-based model that leverages novel key predictors for estimating pathloss.
By quantitatively evaluating the ability of various ML algorithms in terms of predictive, generalization and computational performance, our results show that Light Gradient Boosting Machine (LightGBM) algorithm overall outperforms others.
arXiv Detail & Related papers (2022-01-30T19:50:16Z) - Neural forecasting at scale [8.245069318446415]
We study the problem of efficiently scaling ensemble-based deep neural networks for time series (TS) forecasting on a large set of time series.
Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5.
arXiv Detail & Related papers (2021-09-20T17:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.