Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
- URL: http://arxiv.org/abs/2406.03865v2
- Date: Wed, 10 Jul 2024 04:34:13 GMT
- Title: Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
- Authors: Senran Fan, Zhicheng Bao, Chen Dong, Haotai Liang, Xiaodong Xu, Ping Zhang,
- Abstract summary: We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching.
The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
- Score: 5.867765921443141
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Semantic communication, as a revolutionary communication architecture, is considered a promising novel communication paradigm. Unlike traditional symbol-based error-free communication systems, semantic-based visual communication systems extract, compress, transmit, and reconstruct images at the semantic level. However, widely used image similarity evaluation metrics, whether pixel-based MSE or PSNR or structure-based MS-SSIM, struggle to accurately measure the loss of semantic-level information of the source during system transmission. This presents challenges in evaluating the performance of visual semantic communication systems, especially when comparing them with traditional communication systems. To address this, we propose a semantic evaluation metric -- SeSS (Semantic Similarity Score), based on Scene Graph Generation and graph matching, which shifts the similarity scores between images into semantic-level graph matching scores. Meanwhile, semantic similarity scores for tens of thousands of image pairs are manually annotated to fine-tune the hyperparameters in the graph matching algorithm, aligning the metric more closely with human semantic perception. The performance of the SeSS is tested on different datasets, including (1)images transmitted by traditional and semantic communication systems at different compression rates, (2)images transmitted by traditional and semantic communication systems at different signal-to-noise ratios, (3)images generated by large-scale model with different noise levels introduced, and (4)cases of images subjected to certain special transformations. The experiments demonstrate the effectiveness of SeSS, indicating that the metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
Related papers
- Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks [2.2997117992292764]
This study proposes a multi-modal image transmission method that leverages diverse semantic information for efficient semantic communication.
The proposed method extracts multi-modal semantic information from an image and transmits only it.
The receiver generates multiple images using an image-generation model and selects an output based on semantic similarity.
arXiv Detail & Related papers (2024-04-17T11:42:39Z) - Reasoning with the Theory of Mind for Pragmatic Semantic Communication [62.87895431431273]
A pragmatic semantic communication framework is proposed in this paper.
It enables effective goal-oriented information sharing between two-intelligent agents.
Numerical evaluations demonstrate the framework's ability to achieve efficient communication with a reduced amount of bits.
arXiv Detail & Related papers (2023-11-30T03:36:19Z) - How to Evaluate Semantic Communications for Images with ViTScore Metric? [18.657768058678375]
We propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore)
ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make it convenient and intuitive for image measurement.
We show that ViTScore is robust and efficient in evaluating the semantic similarity of images.
arXiv Detail & Related papers (2023-09-09T23:03:50Z) - Positive-Augmented Contrastive Learning for Image and Video Captioning
Evaluation [47.40949434032489]
We propose a new contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S)
PAC-S unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data.
Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos.
arXiv Detail & Related papers (2023-03-21T18:03:14Z) - Cognitive Semantic Communication Systems Driven by Knowledge Graph:
Principle, Implementation, and Performance Evaluation [74.38561925376996]
Two cognitive semantic communication frameworks are proposed for the single-user and multiple-user communication scenarios.
An effective semantic correction algorithm is proposed by mining the inference rule from the knowledge graph.
For the multi-user cognitive semantic communication system, a message recovery algorithm is proposed to distinguish messages of different users.
arXiv Detail & Related papers (2023-03-15T12:01:43Z) - Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story.
Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities.
We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z) - Vector Quantized Semantic Communication System [22.579525825992416]
We develop a deep learning-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC.
Specifically, we propose a CNN-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces.
We employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator.
arXiv Detail & Related papers (2022-09-23T10:58:23Z) - Towards Semantic Communications: Deep Learning-Based Image Semantic
Coding [42.453963827153856]
We conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive.
We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level.
Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image.
arXiv Detail & Related papers (2022-08-08T12:29:55Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Wireless Transmission of Images With The Assistance of Multi-level
Semantic Information [16.640928669609934]
MLSC-image is a multi-level semantic aware communication system for wireless image transmission.
We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics.
The numerical results validate the effectiveness and efficiency of the proposed semantic communication system.
arXiv Detail & Related papers (2022-02-08T16:25:26Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.