Sequential Semantic Generative Communication for Progressive
Text-to-Image Generation
- URL: http://arxiv.org/abs/2309.04287v1
- Date: Fri, 8 Sep 2023 12:17:49 GMT
- Title: Sequential Semantic Generative Communication for Progressive
Text-to-Image Generation
- Authors: Hyelin Nam, Jihong Park, Jinho Choi, Seong-Lyun Kim
- Abstract summary: This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models.
The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process.
Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems.
- Score: 32.82954905044597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes new framework of communication system leveraging
promising generation capabilities of multi-modal generative models. Regarding
nowadays smart applications, successful communication can be made by conveying
the perceptual meaning, which we set as text prompt. Text serves as a suitable
semantic representation of image data as it has evolved to instruct an image or
generate image through multi-modal techniques, by being interpreted in a manner
similar to human cognition. Utilizing text can also reduce the overload
compared to transmitting the intact data itself. The transmitter converts
objective image to text through multi-model generation process and the receiver
reconstructs the image using reverse process. Each word in the text sentence
has each syntactic role, responsible for particular piece of information the
text contains. For further efficiency in communication load, the transmitter
sequentially sends words in priority of carrying the most information until
reaches successful communication. Therefore, our primary focus is on the
promising design of a communication system based on image-to-text
transformation and the proposed schemes for sequentially transmitting word
tokens. Our work is expected to pave a new road of utilizing state-of-the-art
generative models to real communication systems
Related papers
- Exploring Textual Semantics Diversity for Image Transmission in Semantic Communication Systems using Visual Language Model [4.03161352925235]
This letter proposes a multi-text transmission semantic communication system, which uses the visual language model (VLM) to assist in the transmission of image semantic signals.
Unlike previous image transmission semantic communication systems, the proposed system divides the image into multiple blocks and extracts multiple text information from the image using a modified large language and visual assistant (LLaVA)
Simulation results show that the proposed text semantics diversity scheme can significantly improve the reconstruction accuracy compared with related works.
arXiv Detail & Related papers (2025-03-25T06:42:30Z) - Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed.
It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter.
It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z) - Seek for Incantations: Towards Accurate Text-to-Image Diffusion
Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning.
Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z) - De-Diffusion Makes Text a Strong Cross-Modal Interface [33.90004746543745]
We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding.
Experiments validate the precision and comprehensiveness of De-Diffusion text representing images.
A single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools.
arXiv Detail & Related papers (2023-11-01T16:12:40Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
Translation [10.39028769374367]
We present a new framework that takes text-to-image synthesis to the realm of image-to-image translation.
Our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text.
arXiv Detail & Related papers (2022-11-22T20:39:18Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z) - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy.
We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene.
Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.