The Five-Dollar Model: Generating Game Maps and Sprites from Sentence
Embeddings
- URL: http://arxiv.org/abs/2308.04052v1
- Date: Tue, 8 Aug 2023 05:16:51 GMT
- Title: The Five-Dollar Model: Generating Game Maps and Sprites from Sentence
Embeddings
- Authors: Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius
- Abstract summary: The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images from an encoded text prompt.
We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images.
We evaluate our models performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model.
- Score: 3.620115940532283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The five-dollar model is a lightweight text-to-image generative architecture
that generates low dimensional images from an encoded text prompt. This model
can successfully generate accurate and aesthetically pleasing content in low
dimensional domains, with limited amounts of training data. Despite the small
size of both the model and datasets, the generated images are still able to
maintain the encoded semantic meaning of the textual prompt. We apply this
model to three small datasets: pixel art video game maps, video game sprite
images, and down-scaled emoji images and apply novel augmentation strategies to
improve the performance of our model on these limited datasets. We evaluate our
models performance using cosine similarity score between text-image pairs
generated by the CLIP VIT-B/32 model.
Related papers
- CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model [2.9849290402462927]
We propose CLIP-VQDiffusion, which leverage the pretrained CLIP model to provide multimodal text-image representations and strong image generation capabilities.
Our model outperformed previous state-of-the-art methods by 4.4% in clipscore and generated very realistic images even when the text was both in and out of distribution.
arXiv Detail & Related papers (2024-03-22T04:34:59Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - Paragraph-to-Image Generation with Information-Enriched Diffusion Model [67.9265336953134]
ParaDiffusion is an information-enriched diffusion model for paragraph-to-image generation task.
It delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.
The code and dataset will be released to foster community research on long-text alignment.
arXiv Detail & Related papers (2023-11-24T05:17:01Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z) - Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [72.60554897161948]
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences.
In this work, we repurpose such models to generate a descriptive text given an image at inference time.
The resulting captions are much less restrictive than those obtained by supervised captioning methods.
arXiv Detail & Related papers (2021-11-29T11:01:49Z) - Learning Generative Models of Textured 3D Meshes from Real-World Images [26.353307246909417]
We propose a GAN framework for generating textured triangle meshes without relying on such annotations.
We show that the performance of our approach is on par with prior work that relies on ground-truth keypoints.
arXiv Detail & Related papers (2021-03-29T14:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.