Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern
Hopfield Networks
- URL: http://arxiv.org/abs/2208.04441v2
- Date: Sun, 8 Oct 2023 09:51:43 GMT
- Title: Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern
Hopfield Networks
- Authors: Yonghao Xu, Weikang Yu, Pedram Ghamisi, Michael Kopp, and Sepp
Hochreiter
- Abstract summary: We propose a novel text-to-image modern Hopfield network (Txt2Img-MHN) to generate realistic remote sensing images.
To better evaluate the realism and semantic consistency of the generated images, we conduct zero-shot classification on real remote sensing data.
Experiments on the benchmark remote sensing text-image dataset demonstrate that the proposed Txt2Img-MHN can generate more realistic remote sensing images.
- Score: 20.856451960761948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The synthesis of high-resolution remote sensing images based on text
descriptions has great potential in many practical application scenarios.
Although deep neural networks have achieved great success in many important
remote sensing tasks, generating realistic remote sensing images from text
descriptions is still very difficult. To address this challenge, we propose a
novel text-to-image modern Hopfield network (Txt2Img-MHN). The main idea of
Txt2Img-MHN is to conduct hierarchical prototype learning on both text and
image embeddings with modern Hopfield layers. Instead of directly learning
concrete but highly diverse text-image joint feature representations for
different semantics, Txt2Img-MHN aims to learn the most representative
prototypes from text-image embeddings, achieving a coarse-to-fine learning
strategy. These learned prototypes can then be utilized to represent more
complex semantics in the text-to-image generation task. To better evaluate the
realism and semantic consistency of the generated images, we further conduct
zero-shot classification on real remote sensing data using the classification
model trained on synthesized images. Despite its simplicity, we find that the
overall accuracy in the zero-shot classification may serve as a good metric to
evaluate the ability to generate an image from text. Extensive experiments on
the benchmark remote sensing text-image dataset demonstrate that the proposed
Txt2Img-MHN can generate more realistic remote sensing images than existing
methods. Code and pre-trained models are available online
(https://github.com/YonghaoXu/Txt2Img-MHN).
Related papers
- TIPS: Text-Image Pretraining with Spatial Awareness [13.38247732379754]
Self-supervised image-only pretraining is still the go-to method for many vision applications.
We propose a novel general-purpose image-text model, which can be effectively used off-the-shelf for dense and global vision tasks.
arXiv Detail & Related papers (2024-10-21T21:05:04Z) - Visual Text Generation in the Wild [67.37458807253064]
We propose a visual text generator (termed SceneVTG) which can produce high-quality text images in the wild.
The proposed SceneVTG significantly outperforms traditional rendering-based methods and recent diffusion-based methods in terms of fidelity and reasonability.
The generated images provide superior utility for tasks involving text detection and text recognition.
arXiv Detail & Related papers (2024-07-19T09:08:20Z) - Learning to Generate Semantic Layouts for Higher Text-Image
Correspondence in Text-to-Image Synthesis [37.32270579534541]
We propose a novel approach for enhancing text-image correspondence by leveraging available semantic layouts.
Our approach achieves higher text-image correspondence compared to existing text-to-image generation approaches in the Multi-Modal CelebA-HQ and the Cityscapes dataset.
arXiv Detail & Related papers (2023-08-16T05:59:33Z) - ImaginaryNet: Learning Object Detectors without Real Images and
Annotations [66.30908705345973]
We propose a framework to synthesize images by combining pretrained language model and text-to-image model.
With the synthesized images and class labels, weakly supervised object detection can then be leveraged to accomplish Imaginary-Supervised Object Detection.
Experiments show that ImaginaryNet can (i) obtain about 70% performance in ISOD compared with the weakly supervised counterpart of the same backbone trained on real data.
arXiv Detail & Related papers (2022-10-13T10:25:22Z) - Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z) - Primitive Representation Learning for Scene Text Recognition [7.818765015637802]
We propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images.
A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding.
We also propose a framework called PREN2D to alleviate the misalignment problem in attention-based methods.
arXiv Detail & Related papers (2021-05-10T11:54:49Z) - Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation.
Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z) - Text to Image Generation with Semantic-Spatial Aware GAN [41.73685713621705]
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions.
We propose a novel framework Semantic-Spatial Aware GAN, which is trained in an end-to-end fashion so that the text encoder can exploit better text information.
arXiv Detail & Related papers (2021-04-01T15:48:01Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z) - Scene Text Synthesis for Efficient and Effective Deep Network Training [62.631176120557136]
We develop an innovative image synthesis technique that composes annotated training images by embedding foreground objects of interest into background images.
The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training.
Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique.
arXiv Detail & Related papers (2019-01-26T10:15:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.