Related papers: Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding

Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding

URL: http://arxiv.org/abs/2505.10405v1
Date: Thu, 15 May 2025 15:28:32 GMT
Title: Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding
Authors: Jianhao Huang, Qunsong Zeng, Kaibin Huang,
Abstract summary: We develop a hybrid Gen-SemCom system, where both text prompts and semantically critical features are extracted for transmissions.<n>By integrating the text prompt and critical features, the receiver reconstructs high-fidelity images using a diffusion-based generative model.<n> Experimental results validate the GVIF metric's sensitivity to visual fidelity, correlating with both the PSNR and critical information volume.
Score: 29.28886512743758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative semantic communication (Gen-SemCom) with large artificial intelligence (AI) model promises a transformative paradigm for 6G networks, which reduces communication costs by transmitting low-dimensional prompts rather than raw data. However, purely prompt-driven generation loses fine-grained visual details. Additionally, there is a lack of systematic metrics to evaluate the performance of Gen-SemCom systems. To address these issues, we develop a hybrid Gen-SemCom system with a critical information embedding (CIE) framework, where both text prompts and semantically critical features are extracted for transmissions. First, a novel approach of semantic filtering is proposed to select and transmit the semantically critical features of images relevant to semantic label. By integrating the text prompt and critical features, the receiver reconstructs high-fidelity images using a diffusion-based generative model. Next, we propose the generative visual information fidelity (GVIF) metric to evaluate the visual quality of the generated image. By characterizing the statistical models of image features, the GVIF metric quantifies the mutual information between the distorted features and their original counterparts. By maximizing the GVIF metric, we design a channel-adaptive Gen-SemCom system that adaptively control the volume of features and compression rate according to the channel state. Experimental results validate the GVIF metric's sensitivity to visual fidelity, correlating with both the PSNR and critical information volume. In addition, the optimized system achieves superior performance over benchmarking schemes in terms of higher PSNR and lower FID scores.

Related papers

Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation [120.23172120151821]
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models.<n>We introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences.<n>We propose a new metric, Visual Semantic Matching, that quantifies visual inconsistencies in subject-driven image generation.
arXiv Detail & Related papers (2025-09-26T07:11:55Z)
Large AI Model-Enabled Generative Semantic Communications for Image Transmission [37.127618237197495]
We introduce an innovative generative semantic communication system that refines semantic granularity by segmenting images into key and non-key regions.<n>Key regions, which contain essential visual information, are processed using an image oriented semantic encoder.<n>Non-key regions are efficiently compressed through an image-to-text modeling approach.
arXiv Detail & Related papers (2025-09-24T07:46:38Z)
Text-Visual Semantic Constrained AI-Generated Image Quality Assessment [47.575342788480505]
We propose a unified framework to enhance the comprehensive evaluation of both text-image consistency and perceptual distortion in AI-generated images.<n>Our approach integrates key capabilities from multiple models and tackles the aforementioned challenges by introducing two core modules.<n>Tests conducted on multiple benchmark datasets demonstrate that SC-AGIQA outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-07-14T16:21:05Z)
Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack [6.115539523178243]
Task-oriented semantic communication is a promising neural network-based system design for 6G networks.<n>We propose a diffusion-based semantic communication framework, named DiffSem, to optimize semantic information reconstruction.<n>Our results show that DiffSem improves the classification accuracy by 10.03%, and maintain stable performance under dynamic channels.
arXiv Detail & Related papers (2025-06-24T05:21:27Z)
Vision Transformer Based Semantic Communications for Next Generation Wireless Networks [3.8095664680229935]
This paper presents a Vision Transformer (ViT)-based semantic communication framework.<n>By equipping ViT as the encoder-decoder framework, the proposed architecture can proficiently encode images into a high semantic content.<n>The architecture based on the proposed ViT network achieves the Peak Signal-versato-noise Ratio (PSNR) of 38 dB.
arXiv Detail & Related papers (2025-03-21T16:23:02Z)
Vision Transformer-based Semantic Communications With Importance-Aware Quantization [13.328970689723096]
This paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission.<n>We show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
arXiv Detail & Related papers (2024-12-08T19:24:47Z)
Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system. Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density. Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction. Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z)
Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed. It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter. It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z)
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z)
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models [7.291687946822539]
A major drawback of state-of-the-art NR-IQA methods is their limited ability to generalize across diverse IQA settings. Recent text-to-image generative models generate meaningful visual concepts with fine details related to text concepts. In this work, we leverage the denoising process of such diffusion models for generalized IQA by understanding the degree of alignment between learnable quality-aware text prompts and images.
arXiv Detail & Related papers (2024-06-07T05:46:39Z)
Transformer-Aided Semantic Communications [28.63893944806149]
We employ vision transformers specifically for the purpose of compression and compact representation of the input image. Through the use of the attention mechanism inherent in transformers, we create an attention mask. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset.
arXiv Detail & Related papers (2024-05-02T17:50:53Z)
G-Refine: A General Quality Refiner for Text-to-Image Generation [74.16137826891827]
We introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Extensive experimentation reveals that AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases.
arXiv Detail & Related papers (2024-04-29T00:54:38Z)
Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z)
Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach [74.25870052841226]
A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing, and artificial intelligence (AI) technologies to enable many connected intelligence services. Wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints. A novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems.
arXiv Detail & Related papers (2023-04-25T00:15:00Z)
Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.