Related papers: Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks

Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks

URL: http://arxiv.org/abs/2404.11280v1
Date: Wed, 17 Apr 2024 11:42:39 GMT
Title: Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks
Authors: Eri Hosonuma, Taku Yamazaki, Takumi Miyoshi, Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki,
Abstract summary: This study proposes a multi-modal image transmission method that leverages diverse semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an image and transmits only it. The receiver generates multiple images using an image-generation model and selects an output based on semantic similarity.
Score: 2.2997117992292764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To reduce network traffic and support environments with limited resources, a method for transmitting images with low amounts of transmission data is required. Machine learning-based image compression methods, which compress the data size of images while maintaining their features, have been proposed. However, in certain situations, reconstructing a part of semantic information of images at the receiver end may be sufficient. To realize this concept, semantic-information-based communication, called semantic communication, has been proposed, along with an image transmission method using semantic communication. This method transmits only the semantic information of an image, and the receiver reconstructs the image using an image-generation model. This method utilizes one type of semantic information, but reconstructing images similar to the original image using only it is challenging. This study proposes a multi-modal image transmission method that leverages diverse semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an image and transmits only it. Subsequently, the receiver generates multiple images using an image-generation model and selects an output based on semantic similarity. The receiver must select the output based only on the received features; however, evaluating semantic similarity using conventional metrics is challenging. Therefore, this study explored new metrics to evaluate the similarity between semantic features of images and proposes two scoring procedures. The results indicate that the proposed procedures can compare semantic similarities, such as position and composition, between semantic features of the original and generated images. Thus, the proposed method can facilitate the transmission and utilization of photographs through mobile networks for various service applications.

Related papers

Knowledge-Base based Semantic Image Transmission Using CLIP [0.7323373755126116]
This paper proposes a novel knowledge-Base (KB) assisted semantic communication framework for image transmission. The proposed system prioritizes semantic accuracy, offering a new evaluation paradigm for semantic-aware communication systems.
arXiv Detail & Related papers (2025-04-01T12:53:54Z)
Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model [4.03161352925235]
This letter proposes a multi-text transmission semantic communication system, which uses the visual language model (VLM) to assist in the transmission of image semantic signals. Unlike previous image transmission semantic communication systems, the proposed system divides the image into multiple blocks and extracts multiple text information from the image using a modified large language and visual assistant (LLaVA) Simulation results show that the proposed text semantics diversity scheme can significantly improve the reconstruction accuracy compared with related works.
arXiv Detail & Related papers (2025-03-25T06:42:30Z)
Semantic Similarity Score for Measuring Visual Similarity at Semantic Level [5.867765921443141]
We propose a semantic evaluation metric -- SeSS (Semantic Similarity Score) based on Scene Graph Generation and graph matching. The metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
arXiv Detail & Related papers (2024-06-06T08:51:26Z)
Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations [32.892042877725125]
Current image variation techniques involve adapting a text-to-image model to reconstruct an input image conditioned on the same image. We show that a diffusion model trained to reconstruct an input image from frozen embeddings, can reconstruct the image with minor variations. We propose a new pretraining strategy to generate image variations using a large collection of image pairs.
arXiv Detail & Related papers (2024-05-23T17:58:03Z)
Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things [16.505798124923224]
A novel deep image semantic communication model is proposed for the efficient image communication in AIoT. At the transmitter side, a high-precision image semantic segmentation algorithm is proposed to extract the semantic information of the image. At the receiver side, a semantic image restoration algorithm is proposed to convert the semantic image to a real scene image with detailed information.
arXiv Detail & Related papers (2023-11-06T07:43:42Z)
Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task. The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures. The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss. We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z)
Memory-Driven Text-to-Image Generation [126.58244124144827]
We introduce a memory-driven semi-parametric approach to text-to-image generation. Non-parametric component is a memory bank of image features constructed from a training set of images. parametric component is a generative adversarial network.
arXiv Detail & Related papers (2022-08-15T06:32:57Z)
Towards Semantic Communications: Deep Learning-Based Image Semantic Coding [42.453963827153856]
We conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive. We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level. Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image.
arXiv Detail & Related papers (2022-08-08T12:29:55Z)
Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks. Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
Wireless Transmission of Images With The Assistance of Multi-level Semantic Information [16.640928669609934]
MLSC-image is a multi-level semantic aware communication system for wireless image transmission. We employ a pretrained image caption to capture the text semantics and a pretrained image segmentation model to obtain the segmentation semantics. The numerical results validate the effectiveness and efficiency of the proposed semantic communication system.
arXiv Detail & Related papers (2022-02-08T16:25:26Z)
Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework. Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z)
Cross-domain Correspondence Learning for Exemplar-based Image Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain. The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar. We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z)
Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision. We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.