SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration
- URL: http://arxiv.org/abs/2404.19693v1
- Date: Tue, 30 Apr 2024 16:37:27 GMT
- Title: SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration
- Authors: Yuto Nakashima, Mingzhe Yang, Yukino Baba,
- Abstract summary: We propose a novel approach that uses simple user-swipe interactions to generate preferred images for users.
To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN.
We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user.
- Score: 3.864321514889098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating preferred images using generative adversarial networks (GANs) is challenging owing to the high-dimensional nature of latent space. In this study, we propose a novel approach that uses simple user-swipe interactions to generate preferred images for users. To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN, creating meaningful subspaces. We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user. Experiments show that our method is more efficient in generating preferred images than the baseline methods. Furthermore, changes in preferred images during image generation or the display of entirely different image styles were observed to provide new inspirations, subsequently altering user preferences. This highlights the dynamic nature of user preferences, which our proposed approach recognizes and enhances.
Related papers
- AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [14.543341303789445]
We propose a novel mask-free point-based image editing method, AdaptiveDrag, which generates images that better align with user intent.
To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization.
Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs.
arXiv Detail & Related papers (2024-10-16T15:59:02Z) - Generative Photomontage [40.49579203394384]
We propose a framework for creating the desired image by compositing it from various parts of generated images.
We let users select desired parts from the generated results using a brush stroke interface.
We show compelling results for each application and demonstrate that our method outperforms existing image blending methods.
arXiv Detail & Related papers (2024-08-13T17:59:51Z) - FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers.
We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Manipulating Embeddings of Stable Diffusion Prompts [22.10069408287608]
We propose and analyze a new method to manipulate the embedding of a prompt instead of the prompt text.
Our methods are considered less tedious and that the resulting images are often preferred.
arXiv Detail & Related papers (2023-08-23T10:59:41Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation [45.20783737095007]
We explore and analyze the latent style space of StyleGAN2, a state-of-the-art architecture for image generation.
StyleSpace is significantly more disentangled than the other intermediate latent spaces explored by previous works.
Our findings pave the way to semantically meaningful and well-disentangled image manipulations via simple and intuitive interfaces.
arXiv Detail & Related papers (2020-11-25T15:00:33Z) - Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z) - Sequential Gallery for Interactive Visual Design Optimization [51.52002870143971]
We propose a novel user-in-the-loop optimization method that allows users to efficiently find an appropriate parameter set.
We also propose using a gallery-based interface that provides options in the two-dimensional subspace arranged in an adaptive grid view.
Our experiment with synthetic functions shows that our sequential plane search can find satisfactory solutions in fewer iterations than baselines.
arXiv Detail & Related papers (2020-05-08T15:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.