Related papers: SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration

SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration

URL: http://arxiv.org/abs/2404.19693v1
Date: Tue, 30 Apr 2024 16:37:27 GMT
Title: SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration
Authors: Yuto Nakashima, Mingzhe Yang, Yukino Baba,
Abstract summary: We propose a novel approach that uses simple user-swipe interactions to generate preferred images for users. To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN. We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user.
Score: 3.864321514889098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating preferred images using generative adversarial networks (GANs) is challenging owing to the high-dimensional nature of latent space. In this study, we propose a novel approach that uses simple user-swipe interactions to generate preferred images for users. To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN, creating meaningful subspaces. We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user. Experiments show that our method is more efficient in generating preferred images than the baseline methods. Furthermore, changes in preferred images during image generation or the display of entirely different image styles were observed to provide new inspirations, subsequently altering user preferences. This highlights the dynamic nature of user preferences, which our proposed approach recognizes and enhances.

Related papers

Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding [29.191627597682597]
We present a framework incorporating human-in-the-loop feedback, leveraging a well-trained reward model aligned with user preferences. Our approach consistently surpasses competing models in user satisfaction, especially in multi-turn dialogue scenarios.
arXiv Detail & Related papers (2025-04-25T09:35:02Z)
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition [69.10628479553709]
We introduce DRC, a novel personalized image generation framework that enhances Large Multimodal Models (LMMs) DRC explicitly extracts user style preferences and semantic intentions from history images and the reference image, respectively. It involves two critical learning stages: 1) Disentanglement learning, which employs a dual-tower disentangler to explicitly separate style and semantic features, optimized via a reconstruction-driven paradigm with difficulty-aware importance sampling; and 2) Personalized modeling, which applies semantic-preserving augmentations to effectively adapt the disentangled representations for robust personalized generation.
arXiv Detail & Related papers (2025-04-24T08:10:10Z)
Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions. For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z)
AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [14.543341303789445]
We propose a novel mask-free point-based image editing method, AdaptiveDrag, which generates images that better align with user intent. To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization. Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs.
arXiv Detail & Related papers (2024-10-16T15:59:02Z)
Generative Photomontage [40.49579203394384]
We propose a framework for creating the desired image by compositing it from various parts of generated images. We let users select desired parts from the generated results using a brush stroke interface. We show compelling results for each application and demonstrate that our method outperforms existing image blending methods.
arXiv Detail & Related papers (2024-08-13T17:59:51Z)
FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers. We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
Manipulating Embeddings of Stable Diffusion Prompts [22.10069408287608]
We propose and analyze a new method to manipulate the embedding of a prompt instead of the prompt text. Our methods are considered less tedious and that the resulting images are often preferred.
arXiv Detail & Related papers (2023-08-23T10:59:41Z)
Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes. We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters. We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation. We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt. Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z)
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation [45.20783737095007]
We explore and analyze the latent style space of StyleGAN2, a state-of-the-art architecture for image generation. StyleSpace is significantly more disentangled than the other intermediate latent spaces explored by previous works. Our findings pave the way to semantically meaningful and well-disentangled image manipulations via simple and intuitive interfaces.
arXiv Detail & Related papers (2020-11-25T15:00:33Z)
Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators? [100.60938767993088]
We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives. We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
arXiv Detail & Related papers (2020-11-19T07:37:31Z)
Sequential Gallery for Interactive Visual Design Optimization [51.52002870143971]
We propose a novel user-in-the-loop optimization method that allows users to efficiently find an appropriate parameter set. We also propose using a gallery-based interface that provides options in the two-dimensional subspace arranged in an adaptive grid view. Our experiment with synthetic functions shows that our sequential plane search can find satisfactory solutions in fewer iterations than baselines.
arXiv Detail & Related papers (2020-05-08T15:24:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.