Related papers: Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter

URL: http://arxiv.org/abs/2510.13419v1
Date: Wed, 15 Oct 2025 11:18:24 GMT
Title: Ultra High-Resolution Image Inpainting with Patch-Based Content Consistency Adapter
Authors: Jianhui Zhang, Sheng Cheng, Qirui Sun, Jia Liu, Wang Luyang, Chaoyu Feng, Chen Fang, Lei Lei, Jue Wang, Shuaicheng Liu,
Abstract summary: We present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting.<n>Our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment.<n> Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance.
Score: 47.512192547392026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present Patch-Adapter, an effective framework for high-resolution text-guided image inpainting. Unlike existing methods limited to lower resolutions, our approach achieves 4K+ resolution while maintaining precise content consistency and prompt alignment, two critical challenges in image inpainting that intensify with increasing resolution and texture complexity. Patch-Adapter leverages a two-stage adapter architecture to scale the diffusion model's resolution from 1K to 4K+ without requiring structural overhauls: (1) Dual Context Adapter learns coherence between masked and unmasked regions at reduced resolutions to establish global structural consistency; and (2) Reference Patch Adapter implements a patch-level attention mechanism for full-resolution inpainting, preserving local detail fidelity through adaptive feature fusion. This dual-stage architecture uniquely addresses the scalability gap in high-resolution inpainting by decoupling global semantics from localized refinement. Experiments demonstrate that Patch-Adapter not only resolves artifacts common in large-scale inpainting but also achieves state-of-the-art performance on the OpenImages and Photo-Concept-Bucket datasets, outperforming existing methods in both perceptual quality and text-prompt adherence.

Related papers

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing [62.94394079771687]
A burgeoning trend is to adopt high-dimensional features from representation encoders as generative latents.<n>We propose a systematic framework to adapt understanding-oriented encoder features for generative tasks.<n>We show that our approach achieves state-of-the-art reconstruction, faster convergence, and substantial performance gains in both Text-to-Image (T2I) and image editing tasks.
arXiv Detail & Related papers (2025-12-19T18:59:57Z)
Low-Resolution Editing is All You Need for High-Resolution Editing [67.6663530128766]
We introduce the task of high-resolution image editing and propose a test-time optimization framework to address it.<n>Our method performs patch-wise optimization on high-resolution source images, followed by a fine-grained detail transfer module and a novel synchronization strategy.
arXiv Detail & Related papers (2025-11-25T05:35:32Z)
Scale-DiT: Ultra-High-Resolution Image Generation with Hierarchical Local Attention [50.391914489898774]
Scale-DiT is a new diffusion framework that introduces hierarchical local attention with low-resolution global guidance.<n>A lightweight LoRA adaptation bridges global and local pathways during denoising, ensuring consistency across structure and detail.<n>Experiments demonstrate that Scale-DiT achieves more than $2times$ faster inference and lower memory usage.
arXiv Detail & Related papers (2025-10-18T03:15:26Z)
Local-Global Context-Aware and Structure-Preserving Image Super-Resolution [23.87231269881077]
Pretrained text-to-image models, such as Stable Diffusion, have exhibited strong capabilities in synthesizing realistic image content.<n>We propose a contextually precise image super-resolution framework that effectively maintains both local and global pixel relationships.
arXiv Detail & Related papers (2025-10-11T07:17:31Z)
Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement [4.006320049969407]
Mask2Alpha is an iterative refinement framework designed to enhance semantic comprehension, instance awareness, and fine-detail recovery in image matting.<n>Our framework leverages self-supervised Vision Transformer features as semantic priors, strengthening contextual understanding in complex scenarios.<n>Mask2Alpha consistently achieves state-of-the-art results, showcasing its effectiveness in accurate and efficient image matting.
arXiv Detail & Related papers (2025-02-24T12:16:28Z)
Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration [75.51789992466183]
TAMambaIR simultaneously perceives image textures achieves and a trade-off between performance and efficiency.<n>Extensive experiments on benchmarks for image super-resolution, deraining, and low-light image enhancement demonstrate that TAMambaIR achieves state-of-the-art performance with significantly improved efficiency.
arXiv Detail & Related papers (2025-01-27T23:53:49Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
Feature Refinement to Improve High Resolution Image Inpainting [1.4824891788575418]
Inpainting networks are often unable to generate globally coherent structures at resolutions higher than their training set. We optimize the intermediate featuremaps of a network by minimizing a multiscale consistency loss at inference. This runtime optimization improves the inpainting results and establishes a new state-of-the-art for high resolution inpainting.
arXiv Detail & Related papers (2022-06-27T21:59:12Z)
Controllable Person Image Synthesis with Spatially-Adaptive Warped Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes. We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters. We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map. A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic. We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z)
Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately. By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.