Bayesian Optimization for Controlled Image Editing via LLMs
- URL: http://arxiv.org/abs/2502.18116v2
- Date: Wed, 26 Feb 2025 06:53:39 GMT
- Title: Bayesian Optimization for Controlled Image Editing via LLMs
- Authors: Chengkun Cai, Haoliang Liu, Xu Zhao, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Serge Belongie, Lei Li,
- Abstract summary: BayesGenie is an off-the-shelf approach that integrates Large Language Models with Bayesian Optimization.<n>Our method enables users to modify images through natural language descriptions without manual area marking.<n>Our framework significantly outperforms existing methods in both editing accuracy and semantic preservation.
- Score: 37.430396755248104
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image's semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework significantly outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4.
Related papers
- Training-Free Text-Guided Image Editing with Visual Autoregressive Model [46.201510044410995]
We propose a novel text-guided image editing framework based on Visual AutoRegressive modeling.
Our method eliminates the need for explicit inversion while ensuring precise and controlled modifications.
Our framework operates in a training-free manner and achieves high-fidelity editing with faster inference speeds.
arXiv Detail & Related papers (2025-03-31T09:46:56Z) - PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.<n>Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z) - PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery [10.594261300488546]
We introduce a novel framework for progressive exemplar-driven editing with off-the-shelf diffusion models, dubbed PIXELS.<n>PIXELS provides granular control over edits, allowing adjustments at the pixel or region level.<n>We demonstrate that PIXELS delivers high-quality edits efficiently, leading to a notable improvement in quantitative metrics as well as human evaluation.
arXiv Detail & Related papers (2025-01-16T20:26:30Z) - AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [14.543341303789445]
We propose a novel mask-free point-based image editing method, AdaptiveDrag, which generates images that better align with user intent.
To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization.
Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs.
arXiv Detail & Related papers (2024-10-16T15:59:02Z) - Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing [42.73883397041092]
We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism.
In this work, we explore the self-guidance technique to preserve the overall structure of the input image.
We show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing.
arXiv Detail & Related papers (2024-09-02T15:21:46Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - PromptFix: You Prompt and We Fix the Photo [84.69812824355269]
Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks.
The lack of diverse instruction-following data hampers the development of models.
We propose PromptFix, a framework that enables diffusion models to follow human instructions.
arXiv Detail & Related papers (2024-05-27T03:13:28Z) - Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.