Importance-Aware Image Segmentation-based Semantic Communication for
Autonomous Driving
- URL: http://arxiv.org/abs/2401.10153v1
- Date: Tue, 16 Jan 2024 18:14:44 GMT
- Title: Importance-Aware Image Segmentation-based Semantic Communication for
Autonomous Driving
- Authors: Jie Lv, Haonan Tong, Qiang Pan, Zhilong Zhang, Xinxin He, Tao Luo,
Changchuan Yin
- Abstract summary: This article studies the problem of image segmentation-based semantic communication in autonomous driving.
We propose a vehicular image segmentation-oriented semantic communication system, termed VIS-SemCom.
The proposed VIS-SemCom can achieve a coding gain of nearly 6 dB with a 60% mean intersection over union (mIoU), reduce the transmitted data amount by up to 70% with a 60% mIoU, and improve the segmentation intersection over union (IoU) of important objects by 4%, compared to traditional transmission scheme.
- Score: 9.956303020078488
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This article studies the problem of image segmentation-based semantic
communication in autonomous driving. In real traffic scenes, detecting the key
objects (e.g., vehicles, pedestrians and obstacles) is more crucial than that
of other objects to guarantee driving safety. Therefore, we propose a vehicular
image segmentation-oriented semantic communication system, termed VIS-SemCom,
where image segmentation features of important objects are transmitted to
reduce transmission redundancy. First, to accurately extract image semantics,
we develop a semantic codec based on Swin Transformer architecture, which
expands the perceptual field thus improving the segmentation accuracy. Next, we
propose a multi-scale semantic extraction scheme via assigning the number of
Swin Transformer blocks for diverse resolution features, thus highlighting the
important objects' accuracy. Furthermore, the importance-aware loss is invoked
to emphasize the important objects, and an online hard sample mining (OHEM)
strategy is proposed to handle small sample issues in the dataset. Experimental
results demonstrate that the proposed VIS-SemCom can achieve a coding gain of
nearly 6 dB with a 60% mean intersection over union (mIoU), reduce the
transmitted data amount by up to 70% with a 60% mIoU, and improve the
segmentation intersection over union (IoU) of important objects by 4%, compared
to traditional transmission scheme.
Related papers
- ViT LoS V2X: Vision Transformers for Environment-aware LoS Blockage Prediction for 6G Vehicular Networks [20.953587995374168]
We propose a Deep Learning-based approach that combines Convolutional Neural Networks (CNNs) and customized Vision Transformers (ViTs)
Our method capitalizes on the synergistic strengths of CNNs and ViTs to extract features from time-series multimodal data.
Our results show that the proposed approach achieves high accuracy and outperforms state-of-the-art solutions, achieving more than $95%$ accurate predictions.
arXiv Detail & Related papers (2024-06-27T01:38:09Z) - Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues, even those from semantically unrelated objects or regions.
We propose boosting Image Restoration's performance by sharing the key semantics via Transformer for IR (i.e., SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z) - Transformer-Aided Semantic Communications [28.63893944806149]
We employ vision transformers specifically for the purpose of compression and compact representation of the input image.
Through the use of the attention mechanism inherent in transformers, we create an attention mask.
We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset.
arXiv Detail & Related papers (2024-05-02T17:50:53Z) - A Multi-Task Oriented Semantic Communication Framework for Autonomous Vehicles [5.779316179788962]
This work presents a multi-task-oriented semantic communication framework for connected and autonomous vehicles.
We propose a convolutional autoencoder (CAE) that performs the semantic encoding of the road traffic signs.
These encoded images are then transmitted from one CAV to another CAV through satellite in challenging weather conditions.
arXiv Detail & Related papers (2024-03-06T12:04:24Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - Domain Adaptive Semantic Segmentation by Optimal Transport [13.133890240271308]
semantic scene segmentation has received a great deal of attention due to the richness of the semantic information it contains.
Current approaches are mainly based on convolutional neural networks (CNN), but they rely on a large number of labels.
We propose a domain adaptation (DA) framework based on optimal transport (OT) and attention mechanism to address this issue.
arXiv Detail & Related papers (2023-03-29T03:33:54Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Auto-Transfer: Learning to Route Transferrable Representations [77.30427535329571]
We propose a novel adversarial multi-armed bandit approach which automatically learns to route source representations to appropriate target representations.
We see upwards of 5% accuracy improvements compared with the state-of-the-art knowledge transfer methods.
arXiv Detail & Related papers (2022-02-02T13:09:27Z) - Transformer Meets Convolution: A Bilateral Awareness Net-work for
Semantic Segmentation of Very Fine Resolution Ur-ban Scene Images [6.460167724233707]
We propose a bilateral awareness network (BANet) which contains a dependency path and a texture path.
BANet captures the long-range relationships and fine-grained details in VFR images.
Experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effective-ness of BANet.
arXiv Detail & Related papers (2021-06-23T13:57:36Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.