A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
- URL: http://arxiv.org/abs/2504.01603v1
- Date: Wed, 02 Apr 2025 11:13:46 GMT
- Title: A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
- Authors: Yizhe Tang, Zhimin Sun, Yuzhen Du, Ran Yi, Guangben Lu, Teng Hu, Luying Li, Lizhuang Ma, Fangyuan Zou,
- Abstract summary: "Text-Guided Subject-Position Variable Background Inpainting" aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background.<n>We design a PosAgent Block that adaptively predicts an appropriate displacement based on given features to achieve variable subject-position.<n>We equip A$textT$A with a Position Switch Embedding to control whether the subject's position in the generated image is adaptively predicted or fixed.
- Score: 30.214201208361526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, resulting in inconsistencies between the subject and the generated background. To address this challenge, we propose a new task, the "Text-Guided Subject-Position Variable Background Inpainting", which aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background, and propose the Adaptive Transformation Agent (A$^\text{T}$A) for this task. Firstly, we design a PosAgent Block that adaptively predicts an appropriate displacement based on given features to achieve variable subject-position. Secondly, we design the Reverse Displacement Transform (RDT) module, which arranges multiple PosAgent blocks in a reverse structure, to transform hierarchical feature maps from deep to shallow based on semantic information. Thirdly, we equip A$^\text{T}$A with a Position Switch Embedding to control whether the subject's position in the generated image is adaptively predicted or fixed. Extensive comparative experiments validate the effectiveness of our A$^\text{T}$A approach, which not only demonstrates superior inpainting capabilities in subject-position variable inpainting, but also ensures good performance on subject-position fixed inpainting.
Related papers
- Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting [32.030589692062875]
Pinco is a foreground-conditioned inpainting adapter that generates high-quality backgrounds with good text alignment.<n>Our method achieves superior performance and efficiency in foreground-conditioned inpainting.
arXiv Detail & Related papers (2024-12-05T02:08:19Z) - Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator [44.620847608977776]
Diptych Prompting is a novel zero-shot approach that reinterprets as an inpainting task with precise subject alignment.
Our method supports not only subject-driven generation but also stylized image generation and subject-driven image editing.
arXiv Detail & Related papers (2024-11-23T06:17:43Z) - Improving Text-guided Object Inpainting with Semantic Pre-inpainting [95.17396565347936]
We decompose the typical single-stage object inpainting into two cascaded processes: semantic pre-inpainting and high-fieldity object generation.
To achieve this, we cascade a Transformer-based semantic inpainter and an object inpainting diffusion model, leading to a novel CAscaded Transformer-Diffusion framework.
arXiv Detail & Related papers (2024-09-12T17:55:37Z) - Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model [81.96954332787655]
We introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control.
In experiments, Diffree adds new objects with a high success rate while maintaining background consistency, spatial, and object relevance and quality.
arXiv Detail & Related papers (2024-07-24T03:58:58Z) - Sketch-guided Image Inpainting with Partial Discrete Diffusion Process [5.005162730122933]
We introduce a novel partial discrete diffusion process (PDDP) for sketch-guided inpainting.
PDDP corrupts the masked regions of the image and reconstructs these masked regions conditioned on hand-drawn sketches.
The proposed novel transformer module accepts two inputs -- the image containing the masked region to be inpainted and the query sketch to model the reverse diffusion process.
arXiv Detail & Related papers (2024-04-18T07:07:38Z) - Repositioning the Subject within Image [78.8467524191102]
We introduce an innovative dynamic manipulation task, subject repositioning.
This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity.
Our research reveals that the fundamental sub-tasks of subject repositioning can be effectively reformulated as a unified, prompt-guided inpainting task.
arXiv Detail & Related papers (2024-01-30T10:04:49Z) - Cylin-Painting: Seamless {360\textdegree} Panoramic Image Outpainting
and Beyond [136.18504104345453]
We present a Cylin-Painting framework that involves meaningful collaborations between inpainting and outpainting.
The proposed algorithm can be effectively extended to other panoramic vision tasks, such as object detection, depth estimation, and image super-resolution.
arXiv Detail & Related papers (2022-04-18T21:18:49Z) - Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents.
We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.