Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
- URL: http://arxiv.org/abs/2510.13419v1
- Date: Wed, 15 Oct 2025 11:18:24 GMT
- Title: Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
- Authors: Jianhui Zhang, Sheng Cheng, Qirui Sun, Jia Liu, Wang Luyang, Chaoyu Feng, Chen Fang, Lei Lei, Jue Wang, Shuaicheng Liu,
- Abstract summary: We present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting.<n>Our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment.<n> Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance.
- Score: 47.512192547392026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting. Unlike existing methods limited to lower resolutions, our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment, two critical challenges in image inpainting that intensify with increasing resolution and texture complexity. Patch-Adapter leverages a two-stage adapter architecture to scale the diffusion model's resolution from 1K to 4K+ without requiring structural overhauls: (1) Dual Context Adapter learns coherence between masked and unmasked regions at reduced resolutions to establish global structural consistency; and (2) Reference Patch Adapter implements a patch-level attention mechanism for full-resolution inpainting, preserving local detail fidelity through adaptive feature fusion. This dual-stage architecture uniquely addresses the scalability gap in high-resolution inpainting by decoupling global semantics from localized refinement. Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance on the OpenImages and Photo-Concept-Bucket datasets, outperforming existing methods in both perceptual quality and text-prompt adherence.
Related papers
- Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing [62.94394079771687]
A burgeoning trend is to adopt high-dimensional features from representation encoders as generative latents.<n>We propose a systematic framework to adapt understanding-oriented encoder features for generative tasks.<n>We show that our approach achieves state-of-the-art reconstruction, faster convergence, and substantial performance gains in both Text-to-Image (T2I) and image editing tasks.
arXiv Detail & Related papers (2025-12-19T18:59:57Z) - Low-Resolution Editing is All You Need for High-Resolution Editing [67.6663530128766]
We introduce the task of high-resolution image editing and propose a test-time optimization framework to address it.<n>Our method performs patch-wise optimization on high-resolution source images, followed by a fine-grained detail transfer module and a novel synchronization strategy.
arXiv Detail & Related papers (2025-11-25T05:35:32Z) - Scale-DiT: Ultra-High-Resolution Image Generation with Hierarchical Local Attention [50.391914489898774]
Scale-DiT is a new diffusion framework that introduces hierarchical local attention with low-resolution global guidance.<n>A lightweight LoRA adaptation bridges global and local pathways during denoising, ensuring consistency across structure and detail.<n>Experiments demonstrate that Scale-DiT achieves more than $2times$ faster inference and lower memory usage.
arXiv Detail & Related papers (2025-10-18T03:15:26Z) - Local-Global Context-Aware and Structure-Preserving Image Super-Resolution [23.87231269881077]
Pretrained text-to-image models, such as Stable Diffusion, have exhibited strong capabilities in synthesizing realistic image content.<n>We propose a contextually precise image super-resolution framework that effectively maintains both local and global pixel relationships.
arXiv Detail & Related papers (2025-10-11T07:17:31Z) - Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement [4.006320049969407]
Mask2Alpha is an iterative refinement framework designed to enhance semantic comprehension, instance awareness, and fine-detail recovery in image matting.<n>Our framework leverages self-supervised Vision Transformer features as semantic priors, strengthening contextual understanding in complex scenarios.<n>Mask2Alpha consistently achieves state-of-the-art results, showcasing its effectiveness in accurate and efficient image matting.
arXiv Detail & Related papers (2025-02-24T12:16:28Z) - Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration [75.51789992466183]
TAMambaIR simultaneously perceives image textures achieves and a trade-off between performance and efficiency.<n>Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency.
arXiv Detail & Related papers (2025-01-27T23:53:49Z) - Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference.
This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion.
The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z) - Feature Refinement to Improve High Resolution Image Inpainting [1.4824891788575418]
Inpainting networks are often unable to generate globally coherent structures at resolutions higher than their training set.
We optimize the intermediate featuremaps of a network by minimizing a multiscale consistency loss at inference.
This runtime optimization improves the inpainting results and establishes a new state-of-the-art for high resolution inpainting.
arXiv Detail & Related papers (2022-06-27T21:59:12Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z) - Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately.
By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.