Attention-based UNet enabled Lightweight Image Semantic Communication
System over Internet of Things
- URL: http://arxiv.org/abs/2401.07329v1
- Date: Sun, 14 Jan 2024 16:46:50 GMT
- Title: Attention-based UNet enabled Lightweight Image Semantic Communication
System over Internet of Things
- Authors: Guoxin Ma, Haonan Tong, Nuocheng Yang, and Changchuan Yin
- Abstract summary: We study the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices.
We propose an attention-based UNet enabled lightweight image semantic communication (LSSC) system, which achieves low computational complexity and small model size.
- Score: 4.62215026195301
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper studies the problem of the lightweight image semantic
communication system that is deployed on Internet of Things (IoT) devices. In
the considered system model, devices must use semantic communication techniques
to support user behavior recognition in ultimate video service with high data
transmission efficiency. However, it is computationally expensive for IoT
devices to deploy semantic codecs due to the complex calculation processes of
deep learning (DL) based codec training and inference. To make it affordable
for IoT devices to deploy semantic communication systems, we propose an
attention-based UNet enabled lightweight image semantic communication (LSSC)
system, which achieves low computational complexity and small model size. In
particular, we first let the LSSC system train the codec at the edge server to
reduce the training computation load on IoT devices. Then, we introduce the
convolutional block attention module (CBAM) to extract the image semantic
features and decrease the number of downsampling layers thus reducing the
floating-point operations (FLOPs). Finally, we experimentally adjust the
structure of the codec and find out the optimal number of downsampling layers.
Simulation results show that the proposed LSSC system can reduce the semantic
codec FLOPs by 14%, and reduce the model size by 55%, with a sacrifice of 3%
accuracy, compared to the baseline. Moreover, the proposed scheme can achieve a
higher transmission accuracy than the traditional communication scheme in the
low channel signal-to-noise (SNR) region.
Related papers
- Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework [27.524671767937512]
We introduce a novel Generative AI Semantic Communication (GSC) system for single-user scenarios.
At the transmitter end, it employs a joint source-channel coding mechanism based on the Swin Transformer for efficient semantic feature extraction.
At the receiver end, an advanced Diffusion Model (DM) reconstructs high-quality images from degraded signals, enhancing perceptual details.
arXiv Detail & Related papers (2024-07-31T06:08:51Z) - VideoQA-SC: Adaptive Semantic Communication for Video Question Answering [21.0279034601774]
We propose an end-to-end SC system for video question answering tasks called VideoQA-SC.
Our goal is to accomplish VideoQA tasks directly based on video semantics over noisy or fading wireless channels.
Our results show the great potential of task-oriented SC system design for video applications.
arXiv Detail & Related papers (2024-05-17T06:11:10Z) - Visual Language Model based Cross-modal Semantic Communication Systems [42.321208020228894]
We propose a novel Vision-Language Model-based Cross-modal Semantic Communication system.
The VLM-CSC comprises three novel components.
The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.
arXiv Detail & Related papers (2024-05-06T08:59:16Z) - Knowledge Distillation Based Semantic Communications For Multiple Users [10.770552656390038]
We consider the semantic communication (SemCom) system with multiple users, where there is a limited number of training samples and unexpected interference.
We propose a knowledge distillation (KD) based system where Transformer based encoder-decoder is implemented as the semantic encoder-decoder and fully connected neural networks are implemented as the channel encoder-decoder.
Numerical results demonstrate that KD significantly improves the robustness and the generalization ability when applied to unexpected interference, and it reduces the performance loss when compressing the model size.
arXiv Detail & Related papers (2023-11-23T03:28:14Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - Vector Quantized Semantic Communication System [22.579525825992416]
We develop a deep learning-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC.
Specifically, we propose a CNN-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces.
We employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator.
arXiv Detail & Related papers (2022-09-23T10:58:23Z) - Deep Learning-Based Rate-Splitting Multiple Access for Reconfigurable
Intelligent Surface-Aided Tera-Hertz Massive MIMO [56.022764337221325]
Reconfigurable intelligent surface (RIS) can significantly enhance the service coverage of Tera-Hertz massive multiple-input multiple-output (MIMO) communication systems.
However, obtaining accurate high-dimensional channel state information (CSI) with limited pilot and feedback signaling overhead is challenging.
This paper proposes a deep learning (DL)-based rate-splitting multiple access scheme for RIS-aided Tera-Hertz multi-user multiple access systems.
arXiv Detail & Related papers (2022-09-18T03:07:37Z) - ZippyPoint: Fast Interest Point Detection, Description, and Matching
through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms.
ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size.
These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.