Transformer-Aided Semantic Communications
- URL: http://arxiv.org/abs/2405.01521v1
- Date: Thu, 2 May 2024 17:50:53 GMT
- Title: Transformer-Aided Semantic Communications
- Authors: Matin Mortaheb, Erciyes Karakaya, Mohammad A. Amir Khojastepour, Sennur Ulukus,
- Abstract summary: We employ vision transformers specifically for the purpose of compression and compact representation of the input image.
Through the use of the attention mechanism inherent in transformers, we create an attention mask.
We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset.
- Score: 28.63893944806149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.
Related papers
- Task-Driven Semantic Quantization and Imitation Learning for Goal-Oriented Communications [44.330805289033]
We propose a novel goal-oriented communication (GO-COM) framework, namely Goal-Oriented Semantic Variational Autoencoder (GOS-VAE)
To capture the relevant semantic features in the data reconstruction, imitation learning is adopted to measure the data regeneration quality.
Our experimental results demonstrate the power of imitation learning in characterizing goal-oriented semantics and bandwidth efficiency of our proposed GOS-VAE.
arXiv Detail & Related papers (2025-02-25T04:43:16Z) - Semantic Communication based on Generative AI: A New Approach to Image Compression and Edge Optimization [1.450405446885067]
This thesis integrates semantic communication and generative models for optimized image compression and edge network resource allocation.
The communication infrastructure can benefit to significant improvements in bandwidth efficiency and latency reduction.
Results demonstrate the potential of combining generative AI and semantic communication to create more efficient semantic-goal-oriented communication networks.
arXiv Detail & Related papers (2025-02-01T21:48:31Z) - Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes.
In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z) - Vision Transformer-based Semantic Communications With Importance-Aware Quantization [13.328970689723096]
This paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission.
We show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
arXiv Detail & Related papers (2024-12-08T19:24:47Z) - Efficient Semantic Communication Through Transformer-Aided Compression [31.285983939625098]
We introduce a channel-aware adaptive framework for semantic communication.
By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches.
Our method enhances communication efficiency by adapting the encoding resolution to the content's relevance.
arXiv Detail & Related papers (2024-12-02T18:57:28Z) - Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders [6.498925999634298]
This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs)
We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions.
We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics.
arXiv Detail & Related papers (2024-10-07T08:06:41Z) - Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues, even those from semantically unrelated objects or regions.
We propose boosting Image Restoration's performance by sharing the key semantics via Transformer for IR (i.e., SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z) - Importance-Aware Image Segmentation-based Semantic Communication for
Autonomous Driving [9.956303020078488]
This article studies the problem of image segmentation-based semantic communication in autonomous driving.
We propose a vehicular image segmentation-oriented semantic communication system, termed VIS-SemCom.
The proposed VIS-SemCom can achieve a coding gain of nearly 6 dB with a 60% mean intersection over union (mIoU), reduce the transmitted data amount by up to 70% with a 60% mIoU, and improve the segmentation intersection over union (IoU) of important objects by 4%, compared to traditional transmission scheme.
arXiv Detail & Related papers (2024-01-16T18:14:44Z) - Scalable AI Generative Content for Vehicular Network Semantic
Communication [46.589349524682966]
This paper unveils a scalable Artificial Intelligence Generated Content (AIGC) system that leverages an encoder-decoder architecture.
The system converts images into textual representations and reconstructs them into quality-acceptable images, optimizing transmission for vehicular network semantic communication.
Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.
arXiv Detail & Related papers (2023-11-23T02:57:04Z) - Towards Semantic Communications: Deep Learning-Based Image Semantic
Coding [42.453963827153856]
We conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive.
We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level.
Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image.
arXiv Detail & Related papers (2022-08-08T12:29:55Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Wireless Transmission of Images With The Assistance of Multi-level
Semantic Information [16.640928669609934]
MLSC-image is a multi-level semantic aware communication system for wireless image transmission.
We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics.
The numerical results validate the effectiveness and efficiency of the proposed semantic communication system.
arXiv Detail & Related papers (2022-02-08T16:25:26Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.