Repositioning the Subject within Image
- URL: http://arxiv.org/abs/2401.16861v2
- Date: Sun, 17 Mar 2024 12:15:34 GMT
- Title: Repositioning the Subject within Image
- Authors: Yikai Wang, Chenjie Cao, Ke Fan, Qiaole Dong, Yifan Li, Xiangyang Xue, Yanwei Fu,
- Abstract summary: We introduce an innovative dynamic manipulation task, subject repositioning.
This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity.
Our research reveals that the fundamental sub-tasks of subject repositioning can be effectively reformulated as a unified, prompt-guided inpainting task.
- Score: 78.8467524191102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Results of SEELE on ReS demonstrate its efficacy.
Related papers
- SpotActor: Training-Free Layout-Controlled Consistent Image Generation [43.2870588035256]
We present a new formalization of dual energy guidance with optimization in a dual semantic-latent space.
We propose a training-free pipeline, SpotActor, which features a layout-conditioned backward update stage and a consistent forward sampling stage.
The results prove that SpotActor fulfills the expectations of this task and showcases the potential for practical applications.
arXiv Detail & Related papers (2024-09-07T11:52:48Z) - TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization [59.412236435627094]
TALE is a training-free framework harnessing the generative capabilities of text-to-image diffusion models.
We equip TALE with two mechanisms dubbed Adaptive Latent Manipulation and Energy-guided Latent Optimization.
Our experiments demonstrate that TALE surpasses prior baselines and attains state-of-the-art performance in image-guided composition.
arXiv Detail & Related papers (2024-08-07T08:52:21Z) - FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior [50.0535198082903]
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image.
We showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish generic image composition.
arXiv Detail & Related papers (2024-07-06T03:35:43Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects.
By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z) - Image Restoration using Feature-guidance [43.02281823557039]
We present a new approach suitable for handling the image-specific and spatially-varying nature of degradation in images.
We decompose the restoration task into two stages of degradation localization and degraded region-guided restoration.
We demonstrate that the model trained for this auxiliary task contains vital region knowledge, which can be exploited to guide the restoration network's training.
arXiv Detail & Related papers (2022-01-01T13:10:19Z) - Adversarial Image Composition with Auxiliary Illumination [53.89445873577062]
We propose an Adversarial Image Composition Net (AIC-Net) that achieves realistic image composition.
A novel branched generation mechanism is proposed, which disentangles the generation of shadows and the transfer of foreground styles.
Experiments on pedestrian and car composition tasks show that the proposed AIC-Net achieves superior composition performance.
arXiv Detail & Related papers (2020-09-17T12:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.