Memory-Driven Text-to-Image Generation
- URL: http://arxiv.org/abs/2208.07022v1
- Date: Mon, 15 Aug 2022 06:32:57 GMT
- Title: Memory-Driven Text-to-Image Generation
- Authors: Bowen Li, Philip H. S. Torr, Thomas Lukasiewicz
- Abstract summary: We introduce a memory-driven semi-parametric approach to text-to-image generation.
Non-parametric component is a memory bank of image features constructed from a training set of images.
parametric component is a generative adversarial network.
- Score: 126.58244124144827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a memory-driven semi-parametric approach to text-to-image
generation, which is based on both parametric and non-parametric techniques.
The non-parametric component is a memory bank of image features constructed
from a training set of images. The parametric component is a generative
adversarial network. Given a new text description at inference time, the memory
bank is used to selectively retrieve image features that are provided as basic
information of target images, which enables the generator to produce realistic
synthetic results. We also incorporate the content information into the
discriminator, together with semantic features, allowing the discriminator to
make a more reliable prediction. Experimental results demonstrate that the
proposed memory-driven semi-parametric approach produces more realistic images
than purely parametric approaches, in terms of both visual fidelity and
text-image semantic consistency.
Related papers
- Can Encrypted Images Still Train Neural Networks? Investigating Image Information and Random Vortex Transformation [51.475827684468875]
We establish a novel framework for measuring image information content to evaluate the variation in information content during image transformations.
We also propose a novel image encryption algorithm called Random Vortex Transformation.
arXiv Detail & Related papers (2024-11-25T09:14:53Z) - Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks [2.2997117992292764]
This study proposes a multi-modal image transmission method that leverages various types of semantic information for efficient semantic communication.
The proposed method extracts multi-modal semantic information from an original image and transmits only that to a receiver.
The receiver generates multiple images using an image-generation model and selects an output image based on semantic similarity.
arXiv Detail & Related papers (2024-04-17T11:42:39Z) - Beyond Generation: Harnessing Text to Image Models for Object Detection
and Segmentation [29.274362919954218]
We propose a new paradigm to automatically generate training data with accurate labels at scale.
The proposed approach decouples training data generation into foreground object generation, and contextually coherent background generation.
We demonstrate the advantages of our approach on five object detection and segmentation datasets.
arXiv Detail & Related papers (2023-09-12T04:41:45Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image
Generation [18.36261166580862]
Text-to-image generation aims to generate photo-realistic and semantically consistent images according to the given text descriptions.
Existing methods mainly extract the text information from only one sentence to represent an image.
We propose an effective text representation method with the complements of attribute information.
arXiv Detail & Related papers (2022-09-28T12:28:54Z) - Semi-parametric Makeup Transfer via Semantic-aware Correspondence [99.02329132102098]
Large discrepancy between source non-makeup image and reference makeup image is one of key challenges in makeup transfer.
Non-parametric techniques have a high potential for addressing the pose, expression, and occlusion discrepancies.
We propose a textbfSemi-textbfparametric textbfMakeup textbfTransfer (SpMT) method, which combines the reciprocal strengths of non-parametric and parametric mechanisms.
arXiv Detail & Related papers (2022-03-04T12:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.