HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
- URL: http://arxiv.org/abs/2312.14091v3
- Date: Mon, 18 Mar 2024 16:48:13 GMT
- Title: HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
- Authors: Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi,
- Abstract summary: HD-Painter is a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting.
To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores.
Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively.
- Score: 59.01600111737628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, we introduce HD-Painter, a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information resulting in better text aligned generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into the general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study. Code is publicly available at: https://github.com/Picsart-AI-Research/HD-Painter
Related papers
- LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model [1.7966001353008776]
This paper presents LPGen, a high-fidelity, controllable model for landscape painting generation.
We introduce a novel multi-modal framework that integrates image prompts into the diffusion model.
We implement a decoupled cross-attention strategy to ensure compatibility between image and text prompts, facilitating multi-modal image generation.
arXiv Detail & Related papers (2024-07-24T12:32:24Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model [76.02314305164595]
This work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users.
We take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image.
In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts.
arXiv Detail & Related papers (2024-06-03T07:14:19Z) - MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior [65.05773512126089]
NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions are inherently constrained by the capabilities of their underlying 2D inpainters.
We propose MVIP-NeRF that harnesses the potential of diffusion priors for NeRF inpainting, addressing both appearance and geometry aspects.
Our experimental results show better appearance and geometry recovery than previous NeRF inpainting methods.
arXiv Detail & Related papers (2024-05-05T09:04:42Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - Segmentation-Based Parametric Painting [22.967620358813214]
We introduce a novel image-to-painting method that facilitates the creation of large-scale, high-fidelity paintings with human-like quality and stylistic variation.
We introduce a segmentation-based painting process and a dynamic attention map approach inspired by human painting strategies.
Our optimized batch processing and patch-based loss framework enable efficient handling of large canvases.
arXiv Detail & Related papers (2023-11-24T04:15:10Z) - PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like
Interactions [12.792576041526287]
PromptPaint allows users to mix prompts that express challenging concepts.
We characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models.
arXiv Detail & Related papers (2023-08-09T18:41:11Z) - Cylin-Painting: Seamless {360\textdegree} Panoramic Image Outpainting
and Beyond [136.18504104345453]
We present a Cylin-Painting framework that involves meaningful collaborations between inpainting and outpainting.
The proposed algorithm can be effectively extended to other panoramic vision tasks, such as object detection, depth estimation, and image super-resolution.
arXiv Detail & Related papers (2022-04-18T21:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.