Demo: Real-Time Semantic Communications with a Vision Transformer
- URL: http://arxiv.org/abs/2205.03886v1
- Date: Sun, 8 May 2022 14:49:54 GMT
- Title: Demo: Real-Time Semantic Communications with a Vision Transformer
- Authors: Hanju Yoo, Taehun Jung, Linglong Dai, Songkuk Kim and Chan-Byoung Chae
- Abstract summary: We propose an end-to-end deep neural network-based architecture for image transmission and demonstrate its feasibility in a real-time wireless channel.
To the best of our knowledge, this is the first work that implements and investigates real-time semantic communications with a vision transformer.
- Score: 14.85519988496995
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Semantic communications are expected to enable the more effective delivery of
meaning rather than a precise transfer of symbols. In this paper, we propose an
end-to-end deep neural network-based architecture for image transmission and
demonstrate its feasibility in a real-time wireless channel by implementing a
prototype based on a field-programmable gate array (FPGA). We demonstrate that
this system outperforms the traditional 256-quadrature amplitude modulation
system in the low signal-to-noise ratio regime with the popular CIFAR-10
dataset. To the best of our knowledge, this is the first work that implements
and investigates real-time semantic communications with a vision transformer.
Related papers
- Modeling and Performance Analysis for Semantic Communications Based on Empirical Results [53.805458017074294]
We propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR.
For image reconstruction tasks, the proposed ABG formula can well fit the commonly used DL networks, such as SCUNet, and Vision Transformer.
To the best of our knowledge, this is the first theoretical expression between end-to-end performance metrics and SNR for semantic communications.
arXiv Detail & Related papers (2025-04-29T06:07:50Z) - Vision Transformer Based Semantic Communications for Next Generation Wireless Networks [3.8095664680229935]
This paper presents a Vision Transformer (ViT)-based semantic communication framework.
By equipping ViT as the encoder-decoder framework, the proposed architecture can proficiently encode images into a high semantic content.
The architecture based on the proposed ViT network achieves the Peak Signal-versato-noise Ratio (PSNR) of 38 dB.
arXiv Detail & Related papers (2025-03-21T16:23:02Z) - Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model [55.71885688565501]
We propose a scalable generative video semantic communication framework that extracts and transmits semantic information to achieve high-quality video reconstruction.
Specifically, at the transmitter, description and other condition signals are extracted from the source video, functioning as text and structural semantics, respectively.
At the receiver, the diffusion-based GenAI large models are utilized to fuse the semantics of the multiple modalities for reconstructing the video.
arXiv Detail & Related papers (2025-02-19T15:59:07Z) - Vision Transformer-based Semantic Communications With Importance-Aware Quantization [13.328970689723096]
This paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission.
We show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
arXiv Detail & Related papers (2024-12-08T19:24:47Z) - Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system.
Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density.
Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction.
Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z) - Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed.
It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter.
It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z) - Semantic Communication for Cooperative Perception using HARQ [51.148203799109304]
We leverage an importance map to distill critical semantic information, introducing a cooperative perception semantic communication framework.
To counter the challenges posed by time-varying multipath fading, our approach incorporates the use of frequency-division multiplexing (OFDM) along with channel estimation and equalization strategies.
We introduce a novel semantic error detection method that is integrated with our semantic communication framework in the spirit of hybrid automatic repeated request (HARQ)
arXiv Detail & Related papers (2024-08-29T08:53:26Z) - Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework [27.524671767937512]
We introduce a novel Generative AI Semantic Communication (GSC) system for single-user scenarios.
At the transmitter end, it employs a joint source-channel coding mechanism based on the Swin Transformer for efficient semantic feature extraction.
At the receiver end, an advanced Diffusion Model (DM) reconstructs high-quality images from degraded signals, enhancing perceptual details.
arXiv Detail & Related papers (2024-07-31T06:08:51Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models [43.27015039765803]
We develop a latency-aware semantic communications framework with pre-trained generative models.
We demonstrate ultra-low-rate, low-latency, and channel-adaptive semantic communications.
arXiv Detail & Related papers (2024-03-25T23:04:09Z) - Digital versus Analog Transmissions for Federated Learning over Wireless
Networks [91.20926827568053]
We compare two effective communication schemes for wireless federated learning (FL) over resource-constrained networks.
We first examine both digital and analog transmission methods, together with a unified and fair comparison scheme under practical constraints.
A universal convergence analysis under various imperfections is established for FL performance evaluation in wireless networks.
arXiv Detail & Related papers (2024-02-15T01:50:46Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - Model-based Deep Learning Receiver Design for Rate-Splitting Multiple
Access [65.21117658030235]
This work proposes a novel design for a practical RSMA receiver based on model-based deep learning (MBDL) methods.
The MBDL receiver is evaluated in terms of uncoded Symbol Error Rate (SER), throughput performance through Link-Level Simulations (LLS) and average training overhead.
Results reveal that the MBDL outperforms by a significant margin the SIC receiver with imperfect CSIR.
arXiv Detail & Related papers (2022-05-02T12:23:55Z) - Multi-task Learning Approach for Modulation and Wireless Signal
Classification for 5G and Beyond: Edge Deployment via Model Compression [1.218340575383456]
Future communication networks must address the scarce spectrum to accommodate growth of heterogeneous wireless devices.
We exploit the potential of deep neural networks based multi-task learning framework to simultaneously learn modulation and signal classification tasks.
We provide a comprehensive heterogeneous wireless signals dataset for public use.
arXiv Detail & Related papers (2022-02-26T14:51:02Z) - End-to-End Learning for Uplink MU-SIMO Joint Transmitter and
Non-Coherent Receiver Design in Fading Channels [11.182920270301304]
A novel end-to-end learning approach, namely JTRD-Net, is proposed for uplink multiuser single-input multiple-output (MU-SIMO) joint transmitter and non-coherent receiver design (JTRD) in fading channels.
The transmitter side is modeled as a group of parallel linear layers, which are responsible for multiuser waveform design.
The non-coherent receiver is formed by a deep feed-forward neural network (DFNN) so as to provide multiuser detection (MUD) capabilities.
arXiv Detail & Related papers (2021-05-04T02:47:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.