Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces
- URL: http://arxiv.org/abs/2512.05657v1
- Date: Fri, 05 Dec 2025 12:05:31 GMT
- Title: Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces
- Authors: Mario Edoardo Pandolfo, Kyriakos Stylianopoulos, George C. Alexandropoulos, Paolo Di Lorenzo,
- Abstract summary: We introduce the first over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIM)<n>SIMs can reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR)<n> Experiments with heterogeneous vision transformer (ViT) encoders show that SIMs can accurately reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR)
- Score: 34.75476728721597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic communication systems aim to transmit task-relevant information between devices capable of artificial intelligence, but their performance can degrade when heterogeneous transmitter-receiver models produce misaligned latent representations. Existing semantic alignment methods typically rely on additional digital processing at the transmitter or receiver, increasing overall device complexity. In this work, we introduce the first over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIM), which enables latent-space alignment directly in the wave domain, reducing substantially the computational burden at the device level. We model SIMs as trainable linear operators capable of emulating both supervised linear aligners and zero-shot Parseval-frame-based equalizers. To realize these operators physically, we develop a gradient-based optimization procedure that tailors the metasurface transfer function to a desired semantic mapping. Experiments with heterogeneous vision transformer (ViT) encoders show that SIMs can accurately reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR), while maintaining strong robustness even at low SNR values.
Related papers
- DeepONet-accelerated Bayesian inversion for moving boundary problems [0.0]
This work demonstrates that neural operator learning provides a powerful and flexible framework for building fast, accurate emulators of moving boundary systems.<n>A Deep Operator Network (DeepONet) architecture is employed to construct an efficient surrogate model for moving boundary problems in single-phase Darcy flow through porous media.<n>The proposed inversion framework is demonstrated by estimating the permeability and porosity of fibre reinforcements for composite materials manufactured via the Resin Transfer Moulding (RTM) process.
arXiv Detail & Related papers (2025-12-23T11:22:26Z) - GITO: Graph-Informed Transformer Operator for Learning Complex Partial Differential Equations [0.0]
We present a novel graph-informed transformer operator (GITO) architecture for learning complex partial differential equation systems.<n>GITO consists of two main modules: a hybrid graph transformer (HGT) and a transformer neural operator (TNO)<n> Empirical results on benchmark PDE tasks demonstrate that GITO outperforms existing transformer-based neural operators.
arXiv Detail & Related papers (2025-06-16T18:35:45Z) - SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery [8.965479246496878]
We propose a novel symbolic regression framework that derives physically interpretable expressions directly from remote sensing imagery.<n>SatelliteFormula combines a Vision Transformer-based encoder for spatial-spectral feature extraction with physics-guided constraints to ensure consistency and interpretability.
arXiv Detail & Related papers (2025-06-06T15:39:54Z) - Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors [25.67875816218477]
Full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range.<n>Previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external visual sensors to obtain global positions of key joints.<n>To improve the practicality of the technology for virtual reality applications, we estimate full-body poses using only inertial data obtained from three Inertial Measurement Unit (IMU) sensors worn on the head and wrists.
arXiv Detail & Related papers (2025-05-08T15:28:09Z) - Over-the-Air Edge Inference via End-to-End Metasurfaces-Integrated Artificial Neural Networks [29.28415364984592]
We propose a framework of Metasurfaces-Integrated Neural Networks (MINNs) for Edge Inference (EI)<n>MINNs can significantly simplify EI requirements, achieving near-optimal performance with $50$dB lower testing signal-to-noise ratio compared to training.
arXiv Detail & Related papers (2025-03-31T21:14:09Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for
Autonomous Driving [27.776472262857045]
This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles.
We propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner.
As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks.
arXiv Detail & Related papers (2024-02-04T15:07:49Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Hierarchical Spherical CNNs with Lifting-based Adaptive Wavelets for
Pooling and Unpooling [101.72318949104627]
We propose a novel framework of hierarchical convolutional neural networks (HS-CNNs) with a lifting structure to learn adaptive spherical wavelets for pooling and unpooling.
LiftHS-CNN ensures a more efficient hierarchical feature learning for both image- and pixel-level tasks.
arXiv Detail & Related papers (2022-05-31T07:23:42Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Model-based Deep Learning Receiver Design for Rate-Splitting Multiple
Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods.
The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead.
Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z) - SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge
Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors.
S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration.
We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.