Related papers: Semantic Similarity Score for Measuring Visual Similarity at Semantic Level

Semantic Similarity Score for Measuring Visual Similarity at Semantic Level

URL: http://arxiv.org/abs/2406.03865v2
Date: Wed, 10 Jul 2024 04:34:13 GMT
Title: Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
Authors: Senran Fan, Zhicheng Bao, Chen Dong, Haotai Liang, Xiaodong Xu, Ping Zhang,
Abstract summary: We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching. The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
Score: 5.867765921443141
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Semantic communication, as a revolutionary communication architecture, is considered a promising novel communication paradigm. Unlike traditional symbol-based error-free communication systems, semantic-based visual communication systems extract, compress, transmit, and reconstruct images at the semantic level. However, widely used image similarity evaluation metrics, whether pixel-based MSE or PSNR or structure-based MS-SSIM, struggle to accurately measure the loss of semantic-level information of the source during system transmission. This presents challenges in evaluating the performance of visual semantic communication systems, especially when comparing them with traditional communication systems. To address this, we propose a semantic evaluation metric -- SeSS (Semantic Similarity Score), based on Scene Graph Generation and graph matching, which shifts the similarity scores between images into semantic-level graph matching scores. Meanwhile, semantic similarity scores for tens of thousands of image pairs are manually annotated to fine-tune the hyperparameters in the graph matching algorithm, aligning the metric more closely with human semantic perception. The performance of the SeSS is tested on different datasets, including (1)images transmitted by traditional and semantic communication systems at different compression rates, (2)images transmitted by traditional and semantic communication systems at different signal-to-noise ratios, (3)images generated by large-scale model with different noise levels introduced, and (4)cases of images subjected to certain special transformations. The experiments demonstrate the effectiveness of SeSS, indicating that the metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.

Related papers

Knowledge-Base based Semantic Image Transmission Using CLIP [0.7323373755126116]
This paper proposes a novel knowledge-Base (KB) assisted semantic communication framework for image transmission. The proposed system prioritizes semantic accuracy, offering a new evaluation paradigm for semantic-aware communication systems.
arXiv Detail & Related papers (2025-04-01T12:53:54Z)
Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model [4.03161352925235]
This letter proposes a multi-text transmission semantic communication system, which uses the visual language model (VLM) to assist in the transmission of image semantic signals. Unlike previous image transmission semantic communication systems, the proposed system divides the image into multiple blocks and extracts multiple text information from the image using a modified large language and visual assistant (LLaVA) Simulation results show that the proposed text semantics diversity scheme can significantly improve the reconstruction accuracy compared with related works.
arXiv Detail & Related papers (2025-03-25T06:42:30Z)
Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation [48.642826318384294]
Contrastive vision-language models, such as CLIP, have demonstrated excellent zero-shot capability across semantic recognition tasks. This paper presents a new multimodal disentangled representation learning framework, which leverages disentangled text to guide image disentanglement.
arXiv Detail & Related papers (2025-03-04T02:36:48Z)
Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed. It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter. It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z)
Offline Evaluation of Set-Based Text-to-Image Generation [55.1766769455424]
Ideation is an important subclass of Text-to-Image (TTI) tasks. Existing evaluation metrics for TTI remain focused on distributional similarity metrics. We develop TTI evaluation metrics with explicit models of how users browse and interact with sets of spatially arranged generated images.
arXiv Detail & Related papers (2024-10-22T18:04:00Z)
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z)
Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks [2.2997117992292764]
This study proposes a multi-modal image transmission method that leverages various types of semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an original image and transmits only that to a receiver. The receiver generates multiple images using an image-generation model and selects an output image based on semantic similarity.
arXiv Detail & Related papers (2024-04-17T11:42:39Z)
How to Evaluate Semantic Communications for Images with ViTScore Metric? [18.657768058678375]
We propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore) ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make it convenient and intuitive for image measurement. We show that ViTScore is robust and efficient in evaluating the semantic similarity of images.
arXiv Detail & Related papers (2023-09-09T23:03:50Z)
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation [47.40949434032489]
We propose a new contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S) PAC-S unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos.
arXiv Detail & Related papers (2023-03-21T18:03:14Z)
Cognitive Semantic Communication Systems Driven by Knowledge Graph: Principle, Implementation, and Performance Evaluation [74.38561925376996]
Two cognitive semantic communication frameworks are proposed for the single-user and multiple-user communication scenarios. An effective semantic correction algorithm is proposed by mining the inference rule from the knowledge graph. For the multi-user cognitive semantic communication system, a message recovery algorithm is proposed to distinguish messages of different users.
arXiv Detail & Related papers (2023-03-15T12:01:43Z)
Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story. Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities. We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z)
Vector Quantized Semantic Communication System [22.579525825992416]
We develop a deep learning-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a CNN-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces. We employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator.
arXiv Detail & Related papers (2022-09-23T10:58:23Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Wireless Transmission of Images With The Assistance of Multi-level Semantic Information [16.640928669609934]
MLSC-image is a multi-level semantic aware communication system for wireless image transmission. We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics. The numerical results validate the effectiveness and efficiency of the proposed semantic communication system.
arXiv Detail & Related papers (2022-02-08T16:25:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.