Knowledge-Base based Semantic Image Transmission Using CLIP
- URL: http://arxiv.org/abs/2504.01053v1
- Date: Tue, 01 Apr 2025 12:53:54 GMT
- Title: Knowledge-Base based Semantic Image Transmission Using CLIP
- Authors: Chongyang Li, Yanmei He, Tianqian Zhang, Mingjian He, Shouyin Liu,
- Abstract summary: This paper proposes a novel knowledge-Base (KB) assisted semantic communication framework for image transmission.<n>The proposed system prioritizes semantic accuracy, offering a new evaluation paradigm for semantic-aware communication systems.
- Score: 0.7323373755126116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel knowledge-Base (KB) assisted semantic communication framework for image transmission. At the receiver, a Facebook AI Similarity Search (FAISS) based vector database is constructed by extracting semantic embeddings from images using the Contrastive Language-Image Pre-Training (CLIP) model. During transmission, the transmitter first extracts a 512-dimensional semantic feature using the CLIP model, then compresses it with a lightweight neural network for transmission. After receiving the signal, the receiver reconstructs the feature back to 512 dimensions and performs similarity matching from the KB to retrieve the most semantically similar image. Semantic transmission success is determined by category consistency between the transmitted and retrieved images, rather than traditional metrics like Peak Signal-to-Noise Ratio (PSNR). The proposed system prioritizes semantic accuracy, offering a new evaluation paradigm for semantic-aware communication systems. Experimental validation on CIFAR100 demonstrates the effectiveness of the framework in achieving semantic image transmission.
Related papers
- Data-Efficient Generalization for Zero-shot Composed Image Retrieval [67.46975191141928]
ZS-CIR aims to retrieve the target image based on a reference image and a text description without requiring in-distribution triplets for training.<n>One prevalent approach follows the vision-language pretraining paradigm that employs a mapping network to transfer the image embedding to a pseudo-word token in the text embedding space.<n>We propose a Data-efficient Generalization (DeG) framework, including two novel designs, namely, Textual Supplement (TS) module and Semantic-Set (S-Set)
arXiv Detail & Related papers (2025-03-07T07:49:31Z) - Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation [48.642826318384294]
Contrastive vision-language models, such as CLIP, have demonstrated excellent zero-shot capability across semantic recognition tasks.<n>This paper presents a new multimodal disentangled representation learning framework, which leverages disentangled text to guide image disentanglement.
arXiv Detail & Related papers (2025-03-04T02:36:48Z) - Vision Transformer-based Semantic Communications With Importance-Aware Quantization [13.328970689723096]
This paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission.
We show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
arXiv Detail & Related papers (2024-12-08T19:24:47Z) - Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission.
Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility.
We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z) - Semantic Similarity Score for Measuring Visual Similarity at Semantic Level [5.867765921443141]
We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching.
The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
arXiv Detail & Related papers (2024-06-06T08:51:26Z) - Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks [2.2997117992292764]
This study proposes a multi-modal image transmission method that leverages various types of semantic information for efficient semantic communication.
The proposed method extracts multi-modal semantic information from an original image and transmits only that to a receiver.
The receiver generates multiple images using an image-generation model and selects an output image based on semantic similarity.
arXiv Detail & Related papers (2024-04-17T11:42:39Z) - Reasoning with the Theory of Mind for Pragmatic Semantic Communication [62.87895431431273]
A pragmatic semantic communication framework is proposed in this paper.
It enables effective goal-oriented information sharing between two-intelligent agents.
Numerical evaluations demonstrate the framework's ability to achieve efficient communication with a reduced amount of bits.
arXiv Detail & Related papers (2023-11-30T03:36:19Z) - Progressive Tree-Structured Prototype Network for End-to-End Image
Captioning [74.8547752611337]
We propose a novel Progressive Tree-Structured prototype Network (dubbed PTSN)
PTSN is the first attempt to narrow down the scope of prediction words with appropriate semantics by modeling the hierarchical textual semantics.
Our method achieves a new state-of-the-art performance with 144.2% (single model) and 146.5% (ensemble of 4 models) CIDEr scores on Karpathy' split and 141.4% (c5) and 143.9% (c40) CIDEr scores on the official online test server.
arXiv Detail & Related papers (2022-11-17T11:04:00Z) - Vector Quantized Semantic Communication System [22.579525825992416]
We develop a deep learning-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC.
Specifically, we propose a CNN-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces.
We employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator.
arXiv Detail & Related papers (2022-09-23T10:58:23Z) - Towards Semantic Communications: Deep Learning-Based Image Semantic
Coding [42.453963827153856]
We conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive.
We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level.
Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image.
arXiv Detail & Related papers (2022-08-08T12:29:55Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Wireless Transmission of Images With The Assistance of Multi-level
Semantic Information [16.640928669609934]
MLSC-image is a multi-level semantic aware communication system for wireless image transmission.
We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics.
The numerical results validate the effectiveness and efficiency of the proposed semantic communication system.
arXiv Detail & Related papers (2022-02-08T16:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.