Surrealistic-like Image Generation with Vision-Language Models
- URL: http://arxiv.org/abs/2412.14366v1
- Date: Wed, 18 Dec 2024 22:03:26 GMT
- Title: Surrealistic-like Image Generation with Vision-Language Models
- Authors: Elif Ayten, Shuai Wang, Hjalmar Snoep,
- Abstract summary: In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models.
Our investigation starts with the generation of images under various image generation settings and different models.
We evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images.
- Score: 4.66729174362509
- License:
- Abstract: Recent advances in generative AI make it convenient to create different types of content, including text, images, and code. In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models, including DALL-E, Deep Dream Generator, and DreamStudio. Our investigation starts with the generation of images under various image generation settings and different models. The primary objective is to identify the most suitable model and settings for producing such images. Additionally, we aim to understand the impact of using edited base images on the generated resulting images. Through these experiments, we evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images. Our analysis shows that Dall-E 2 performs the best when using the generated prompt by ChatGPT.
Related papers
- Personalized Image Generation with Deep Generative Models: A Decade Survey [51.26287478042516]
We present a review of generalized personalized image generation across various generative models.
We first define a unified framework that standardizes the personalization process across different generative models.
We then provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations.
arXiv Detail & Related papers (2025-02-18T17:34:04Z) - Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - Elucidating the design space of language models for image generation [13.96798987912677]
We show that image tokens exhibit greater randomness compared to text tokens, which presents challenges when training with token prediction.
Our analysis also reveals that while all models successfully grasp the importance of local information in image generation, smaller models struggle to capture the global context.
Our work is the first to analyze the optimization behavior of language models in vision generation, and we believe it can inspire more effective designs when applying LMs to other domains.
arXiv Detail & Related papers (2024-10-21T17:57:04Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Bridging Different Language Models and Generative Vision Models for
Text-to-Image Generation [12.024554708901514]
We propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.
Our pipeline is compatible with various language models and generative vision models, accommodating different structures.
arXiv Detail & Related papers (2024-03-12T17:50:11Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Diffusion idea exploration for art generation [0.10152838128195467]
Diffusion models have recently outperformed other generative models in image generation tasks using cross modal data as guiding information.
The initial experiments for this task of novel image generation demonstrated promising qualitative results.
arXiv Detail & Related papers (2023-07-11T02:35:26Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.