Related papers: Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation

Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation

URL: http://arxiv.org/abs/2601.18242v1
Date: Mon, 26 Jan 2026 07:54:53 GMT
Title: Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation
Authors: Zerui Kang, Yishen Lim, Zhouyou Gu, Seung-Woo Ko, Tony Q. S. Quek, Jihong Park,
Abstract summary: This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation.<n> Experiments in NVIDIA Sionna on indoor scenes show 2-4$times$ faster convergence and 10-100$times$ lower final parameter error.
Score: 45.40179208702883
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate radio-frequency (RF) material parameters are essential for electromagnetic digital twins in 6G systems, yet gradient-based inverse ray tracing (RT) remains sensitive to initialization and costly under limited measurements. This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation in a differentiable RT (DRT) engine. A VLM parses scene images to infer material categories and maps them to quantitative priors via an ITU-R material table, yielding informed conductivity initializations. The VLM further selects informative transmitter/receiver placements that promote diverse, material-discriminative paths. Starting from these priors, the DRT performs gradient-based refinement using measured received signal strengths. Experiments in NVIDIA Sionna on indoor scenes show 2-4$\times$ faster convergence and 10-100$\times$ lower final parameter error compared with uniform or random initialization and random placement baselines, achieving sub-0.1\% mean relative error with only a few receivers. Complexity analyses indicate per-iteration time scales near-linearly with the number of materials and measurement setups, while VLM-guided placement reduces the measurements required for accurate recovery. Ablations over RT depth and ray counts confirm further accuracy gains without significant per-iteration overhead. Results demonstrate that semantic priors from VLMs effectively guide physics-based optimization for fast and reliable RF material estimation.

Related papers

Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework [57.994965436344195]
Beamforming is a key technology in millimeter-wave (mmWave) communications that improves signal transmission by optimizing directionality and intensity.<n> multimodal sensing-aided beam prediction has gained significant attention, using various sensing data to predict user locations or network conditions.<n>Despite its promising potential, the adoption of multimodal sensing-aided beam prediction is hindered by high computational complexity, high costs, and limited datasets.
arXiv Detail & Related papers (2025-04-07T15:38:25Z)
How Critical is Site-Specific RAN Optimization? 5G Open-RAN Uplink Air Interface Performance Test and Optimization from Macro-Cell CIR Data [0.6753334733130354]
We consider the importance of channel measurement data from specific sites and its impact on air interface optimization and test. We leverage our OmniPHY-5G neural receiver for NR PUSCH uplink simulation, with a training procedure that uses statistical TDL channel models for pre-training. The proposed fine-tuning method achieves a 10% block error rate (BLER) at a 1.85 dB lower signal-to-noise ratio (SNR) compared to pre-training.
arXiv Detail & Related papers (2024-10-25T13:57:48Z)
Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations [11.874972134063638]
This paper proposes a novel SSL event-to-video reconstruction approach, dubbed EvINR, which eliminates the need for labeled data or optical flow estimation. We use an implicit neural representation (INR), which takes in coordinate $(x, y, t)$ and predicts intensity values, to represent the event generation equation. To make EvINR feasible for online requisites, we propose several acceleration techniques that substantially expedite the training process.
arXiv Detail & Related papers (2024-07-26T04:18:10Z)
Resolution Limit of Single-Photon LiDAR [9.288380569562678]
Given a fixed amount of flux produced by the laser transmitter across the scene, the per-pixel Signal-to-Noise Ratio (SNR) will decrease when more pixels are packed in a unit space. This presents a fundamental trade-off between the spatial resolution of the sensor array and the SNR received at each pixel.
arXiv Detail & Related papers (2024-03-25T05:21:26Z)
Learning Radio Environments by Differentiable Ray Tracing [56.40113938833999]
We introduce a novel gradient-based calibration method, complemented by differentiable parametrizations of material properties, scattering and antenna patterns. We have validated our method using both synthetic data and real-world indoor channel measurements, employing a distributed multiple-input multiple-output (MIMO) channel sounder.
arXiv Detail & Related papers (2023-11-30T13:50:21Z)
Analyzing the Internals of Neural Radiance Fields [4.681790910494339]
We analyze large, trained ReLU-MLPs used in coarse-to-fine sampling. We show how these large minima activations can be accelerated by transforming intermediate activations to a weight estimate.
arXiv Detail & Related papers (2023-06-01T14:06:48Z)
Sionna RT: Differentiable Ray Tracing for Radio Propagation Modeling [65.17711407805756]
Sionna is a GPU-accelerated open-source library for link-level simulations based on. Since release v0.14 it integrates a differentiable ray tracer (RT) for the simulation of radio wave propagation.
arXiv Detail & Related papers (2023-03-20T13:40:11Z)
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z)
Automatic Velocity Picking Using a Multi-Information Fusion Deep Semantic Segmentation Network [0.0]
Velocity picking, a critical step in seismic data processing, has been studied for decades. Deep learning (DL) methods have produced good results on the seismic data with medium and high signal-to-noise ratios (SNR) We propose a multi-information fusion network (MIFN) to estimate stacking velocity from the fusion information of velocity spectra and stack gather segments (SGS)
arXiv Detail & Related papers (2022-05-07T12:55:13Z)
Transfer Learning for Motor Imagery Based Brain-Computer Interfaces: A Complete Pipeline [54.73337667795997]
Transfer learning (TL) has been widely used in motor imagery (MI) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject. This paper proposes that TL could be considered in all three components (spatial filtering, feature engineering, and classification) of MI-based BCIs.
arXiv Detail & Related papers (2020-07-03T23:44:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.