Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model
- URL: http://arxiv.org/abs/2503.19386v1
- Date: Tue, 25 Mar 2025 06:42:30 GMT
- Title: Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model
- Authors: Peishan Huang, Dong Li,
- Abstract summary: This letter proposes a multi-text transmission semantic communication system, which uses the visual language model (VLM) to assist in the transmission of image semantic signals.<n>Unlike previous image transmission semantic communication systems, the proposed system divides the image into multiple blocks and extracts multiple text information from the image using a modified large language and visual assistant (LLaVA)<n> Simulation results show that the proposed text semantics diversity scheme can significantly improve the reconstruction accuracy compared with related works.
- Score: 4.03161352925235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the rapid development of machine learning has brought reforms and challenges to traditional communication systems. Semantic communication has appeared as an effective strategy to effectively extract relevant semantic signals semantic segmentation labels and image features for image transmission. However, the insufficient number of extracted semantic features of images will potentially result in a low reconstruction accuracy, which hinders the practical applications and still remains challenging for solving. In order to fill this gap, this letter proposes a multi-text transmission semantic communication (Multi-SC) system, which uses the visual language model (VLM) to assist in the transmission of image semantic signals. Unlike previous image transmission semantic communication systems, the proposed system divides the image into multiple blocks and extracts multiple text information from the image using a modified large language and visual assistant (LLaVA), and combines semantic segmentation tags with semantic text for image recovery. Simulation results show that the proposed text semantics diversity scheme can significantly improve the reconstruction accuracy compared with related works.
Related papers
- Data-Efficient Generalization for Zero-shot Composed Image Retrieval [67.46975191141928]
ZS-CIR aims to retrieve the target image based on a reference image and a text description without requiring in-distribution triplets for training.
One prevalent approach follows the vision-language pretraining paradigm that employs a mapping network to transfer the image embedding to a pseudo-word token in the text embedding space.
We propose a Data-efficient Generalization (DeG) framework, including two novel designs, namely, Textual Supplement (TS) module and Semantic-Set (S-Set)
arXiv Detail & Related papers (2025-03-07T07:49:31Z) - ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification [52.405499816861635]
Multiple instance learning (MIL)-based framework has become the mainstream for processing the whole slide image (WSI)<n>We propose a dual-scale vision-language multiple instance learning (ViLa-MIL) framework for whole slide image classification.
arXiv Detail & Related papers (2025-02-12T13:28:46Z) - Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed.
It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter.
It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z) - Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission.
Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility.
We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z) - Semantic Similarity Score for Measuring Visual Similarity at Semantic Level [5.867765921443141]
We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching.
The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
arXiv Detail & Related papers (2024-06-06T08:51:26Z) - Sequential Semantic Generative Communication for Progressive
Text-to-Image Generation [32.82954905044597]
This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models.
The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process.
Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems.
arXiv Detail & Related papers (2023-09-08T12:17:49Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - Vector Quantized Semantic Communication System [22.579525825992416]
We develop a deep learning-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC.
Specifically, we propose a CNN-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces.
We employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator.
arXiv Detail & Related papers (2022-09-23T10:58:23Z) - Wireless Transmission of Images With The Assistance of Multi-level
Semantic Information [16.640928669609934]
MLSC-image is a multi-level semantic aware communication system for wireless image transmission.
We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics.
The numerical results validate the effectiveness and efficiency of the proposed semantic communication system.
arXiv Detail & Related papers (2022-02-08T16:25:26Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.