Large AI Model-Enabled Generative Semantic Communications for Image Transmission
- URL: http://arxiv.org/abs/2509.21394v1
- Date: Wed, 24 Sep 2025 07:46:38 GMT
- Title: Large AI Model-Enabled Generative Semantic Communications for Image Transmission
- Authors: Qiyu Ma, Wanli Ni, Zhijin Qin,
- Abstract summary: We introduce an innovative generative semantic communication system that refines semantic granularity by segmenting images into key and non-key regions.<n>Key regions, which contain essential visual information, are processed using an image oriented semantic encoder.<n>Non-key regions are efficiently compressed through an image-to-text modeling approach.
- Score: 37.127618237197495
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rapid development of generative artificial intelligence (AI) has introduced significant opportunities for enhancing the efficiency and accuracy of image transmission within semantic communication systems. Despite these advancements, existing methodologies often neglect the difference in importance of different regions of the image, potentially compromising the reconstruction quality of visually critical content. To address this issue, we introduce an innovative generative semantic communication system that refines semantic granularity by segmenting images into key and non-key regions. Key regions, which contain essential visual information, are processed using an image oriented semantic encoder, while non-key regions are efficiently compressed through an image-to-text modeling approach. Additionally, to mitigate the substantial storage and computational demands posed by large AI models, the proposed system employs a lightweight deployment strategy incorporating model quantization and low-rank adaptation fine-tuning techniques, significantly boosting resource utilization without sacrificing performance. Simulation results demonstrate that the proposed system outperforms traditional methods in terms of both semantic fidelity and visual quality, thereby affirming its effectiveness for image transmission tasks.
Related papers
- Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation [0.0]
This research introduces a transformative framework for integrating Vision-Enhanced Large Language Models (LLMs) with advanced transformer-based architectures.<n>The proposed model incorporates a rectified flow mechanism that connects noise and data with linear paths, enabling efficient and high-quality generation.<n>The framework achieves unparalleled fidelity in synthesized images and coherent multimodal representations.
arXiv Detail & Related papers (2025-12-14T08:28:50Z) - Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding [29.28886512743758]
We develop a hybrid Gen-SemCom system, where both text prompts and semantically critical features are extracted for transmissions.<n>By integrating the text prompt and critical features, the receiver reconstructs high-fidelity images using a diffusion-based generative model.<n> Experimental results validate the GVIF metric's sensitivity to visual fidelity, correlating with both the PSNR and critical information volume.
arXiv Detail & Related papers (2025-05-15T15:28:32Z) - Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model [4.03161352925235]
This letter proposes a multi-text transmission semantic communication system, which uses the visual language model (VLM) to assist in the transmission of image semantic signals.<n>Unlike previous image transmission semantic communication systems, the proposed system divides the image into multiple blocks and extracts multiple text information from the image using a modified large language and visual assistant (LLaVA)<n> Simulation results show that the proposed text semantics diversity scheme can significantly improve the reconstruction accuracy compared with related works.
arXiv Detail & Related papers (2025-03-25T06:42:30Z) - MambaIC: State Space Models for High-Performance Learned Image Compression [40.155314987485376]
A high-performance image compression algorithm is crucial for real-time information transmission across numerous fields.<n>Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods.<n>We propose an enhanced image compression approach through refined context modeling, which we term MambaIC.
arXiv Detail & Related papers (2025-03-16T11:32:34Z) - Semantic Communication based on Generative AI: A New Approach to Image Compression and Edge Optimization [1.450405446885067]
This thesis integrates semantic communication and generative models for optimized image compression and edge network resource allocation.<n>The communication infrastructure can benefit to significant improvements in bandwidth efficiency and latency reduction.<n>Results demonstrate the potential of combining generative AI and semantic communication to create more efficient semantic-goal-oriented communication networks.
arXiv Detail & Related papers (2025-02-01T21:48:31Z) - Vision Transformer-based Semantic Communications With Importance-Aware Quantization [13.328970689723096]
This paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission.<n>We show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
arXiv Detail & Related papers (2024-12-08T19:24:47Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - Efficient Visual State Space Model for Image Deblurring [99.54894198086852]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.<n>We propose a simple yet effective visual state space model (EVSSM) for image deblurring.<n>The proposed EVSSM performs favorably against state-of-the-art methods on benchmark datasets and real-world images.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - YaART: Yet Another ART Rendering Technology [119.09155882164573]
This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences.
We analyze how these choices affect both the efficiency of the training process and the quality of the generated images.
We demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets.
arXiv Detail & Related papers (2024-04-08T16:51:19Z) - Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.