Related papers: How to Evaluate Semantic Communications for Images with ViTScore Metric?

How to Evaluate Semantic Communications for Images with ViTScore Metric?

URL: http://arxiv.org/abs/2309.04891v2
Date: Sun, 21 Apr 2024 03:42:47 GMT
Title: How to Evaluate Semantic Communications for Images with ViTScore Metric?
Authors: Tingting Zhu, Bo Peng, Jifan Liang, Tingchen Han, Hai Wan, Jingqiao Fu, Junjie Chen,
Abstract summary: We propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore) ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make it convenient and intuitive for image measurement. We show that ViTScore is robust and efficient in evaluating the semantic similarity of images.
Score: 18.657768058678375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic communications (SC) have been expected to be a new paradigm shifting to catalyze the next generation communication, whose main concerns shift from accurate bit transmission to effective semantic information exchange in communications. However, the previous and widely-used metrics for images are not applicable to evaluate the image semantic similarity in SC. Classical metrics to measure the similarity between two images usually rely on the pixel level or the structural level, such as the PSNR and the MS-SSIM. Straightforwardly using some tailored metrics based on deep-learning methods in CV community, such as the LPIPS, is infeasible for SC. To tackle this, inspired by BERTScore in NLP community, we propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore). We prove theoretically that ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make ViTScore convenient and intuitive for image measurement. To evaluate the performance of ViTScore, we compare ViTScore with 3 typical metrics (PSNR, MS-SSIM, and LPIPS) through 4 classes of experiments: (i) correlation with BERTScore through evaluation of image caption downstream CV task, (ii) evaluation in classical image communications, (iii) evaluation in image semantic communication systems, and (iv) evaluation in image semantic communication systems with semantic attack. Experimental results demonstrate that ViTScore is robust and efficient in evaluating the semantic similarity of images. Particularly, ViTScore outperforms the other 3 typical metrics in evaluating the image semantic changes by semantic attack, such as image inverse with Generative Adversarial Networks (GANs). This indicates that ViTScore is an effective performance metric when deployed in SC scenarios.

Related papers

Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation [120.23172120151821]
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models.<n>We introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences.<n>We propose a new metric, Visual Semantic Matching, that quantifies visual inconsistencies in subject-driven image generation.
arXiv Detail & Related papers (2025-09-26T07:11:55Z)
Evaluating Image Caption via Cycle-consistent Text-to-Image Generation [24.455344211552692]
We propose CAMScore, a reference-free automatic evaluation metric for image captioning models. To circumvent the aforementioned modality gap, CAMScore utilizes a text-to-image model to generate images from captions and subsequently evaluates these generated images against the original images. Experiment results show that CAMScore achieves a superior correlation with human judgments compared to existing reference-based and reference-free metrics.
arXiv Detail & Related papers (2025-01-07T06:35:34Z)
Towards Automatic Evaluation for Image Transcreation [52.71090829502756]
We propose a suite of automatic evaluation metrics inspired by machine translation (MT) metrics. We identify three critical dimensions of image transcreation: cultural relevance, semantic equivalence and visual similarity. Our results show that proprietary VLMs best identify cultural relevance and semantic equivalence, while vision-encoder representations are adept at measuring visual similarity.
arXiv Detail & Related papers (2024-12-18T10:55:58Z)
Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery [0.0]
Vision Transformers (ViT) have recently brought a new wave of research in the field of computer vision. This paper focuses on the comparison of three key factors of using (or not using) ViT for semantic segmentation of remote sensing aerial images on the iSAID.
arXiv Detail & Related papers (2024-11-14T00:18:04Z)
CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment [2.3874115898130865]
Image similarity metrics play an important role in computer vision applications, as they are used in image processing, computer vision and machine learning. Existing metrics, such as PSNR, MSE, SSIM, ISSM and FSIM, often face limitations in terms of either speed, complexity or sensitivity to small changes in images. A novel image similarity metric, namely CSIM, that combines real-time while being sensitive to subtle image variations is investigated in this paper.
arXiv Detail & Related papers (2024-10-02T10:46:05Z)
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z)
Semantic Similarity Score for Measuring Visual Similarity at Semantic Level [5.867765921443141]
We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching. The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
arXiv Detail & Related papers (2024-06-06T08:51:26Z)
CrossScore: Towards Multi-View Image Evaluation and Scoring [24.853612457257697]
Cross-reference image quality assessment method fills the gap in the image assessment landscape. Our method enables accurate image quality assessment without requiring ground truth references.
arXiv Detail & Related papers (2024-04-22T17:59:36Z)
LipSim: A Provably Robust Perceptual Similarity Metric [56.03417732498859]
We show the vulnerability of state-of-the-art perceptual similarity metrics based on an ensemble of ViT-based feature extractors to adversarial attacks. We then propose a framework to train a robust perceptual similarity metric called LipSim with provable guarantees. LipSim provides guarded areas around each data point and certificates for all perturbations within an $ell$ ball.
arXiv Detail & Related papers (2023-10-27T16:59:51Z)
DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions. We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images. Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z)
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation [47.40949434032489]
We propose a new contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S) PAC-S unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos.
arXiv Detail & Related papers (2023-03-21T18:03:14Z)
Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning. We develop three progressive model structures to learn the sentence level representations. Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z)
Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE) Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.