FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting
- URL: http://arxiv.org/abs/2512.21104v1
- Date: Wed, 24 Dec 2025 11:06:26 GMT
- Title: FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting
- Authors: Chao Gong, Dong Li, Yingwei Pan, Jingjing Chen, Ting Yao, Tao Mei,
- Abstract summary: Text-guided image inpainting endeavors to generate new content within specified regions of images using textual prompts from users.<n>We introduce FreeInpaint, a plug-and-play tuning-free approach that directly optimize the diffusion latents on the fly during inference to improve the faithfulness of the generated images.
- Score: 98.04041133839088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided image inpainting endeavors to generate new content within specified regions of images using textual prompts from users. The primary challenge is to accurately align the inpainted areas with the user-provided prompts while maintaining a high degree of visual fidelity. While existing inpainting methods have produced visually convincing results by leveraging the pre-trained text-to-image diffusion models, they still struggle to uphold both prompt alignment and visual rationality simultaneously. In this work, we introduce FreeInpaint, a plug-and-play tuning-free approach that directly optimizes the diffusion latents on the fly during inference to improve the faithfulness of the generated images. Technically, we introduce a prior-guided noise optimization method that steers model attention towards valid inpainting regions by optimizing the initial noise. Furthermore, we meticulously design a composite guidance objective tailored specifically for the inpainting task. This objective efficiently directs the denoising process, enhancing prompt alignment and visual rationality by optimizing intermediate latents at each step. Through extensive experiments involving various inpainting diffusion models and evaluation metrics, we demonstrate the effectiveness and robustness of our proposed FreeInpaint.
Related papers
- Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score [4.8677910801584385]
Large-scale text-to-image generative models have shown remarkable ability to synthesize diverse and high-quality images.<n>We present Dual Contrastive Denoising Score, a framework that leverages the rich generative prior of text-to-image diffusion models.<n>Our method achieves both flexible content modification and structure preservation between input and output images, as well as zero-shot image-to-image translation.
arXiv Detail & Related papers (2025-08-18T08:30:07Z) - GuidPaint: Class-Guided Image Inpainting with Diffusion Models [1.1902474395094222]
We propose GuidPaint, a training-free, class-guided image inpainting framework.<n>We show that GuidPaint achieves clear improvements over existing context-aware inpainting methods in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-07-29T09:36:52Z) - Towards Seamless Borders: A Method for Mitigating Inconsistencies in Image Inpainting and Outpainting [22.46566055053259]
We propose two novel methods to address discrepancy issues in diffusion-based inpainting models.<n>First, we introduce a modified Variational Autoencoder that corrects color imbalances, ensuring that the final inpainted results are free of color mismatches.<n>Second, we propose a two-step training strategy that improves the blending of generated and existing image content during the diffusion process.
arXiv Detail & Related papers (2025-06-14T15:02:56Z) - PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework.
We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences.
Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z) - FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models.
In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image.
In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models [59.01600111737628]
HD-Painter is a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting.
To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores.
Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively.
arXiv Detail & Related papers (2023-12-21T18:09:30Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.