PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
- URL: http://arxiv.org/abs/2412.14283v2
- Date: Thu, 30 Jan 2025 00:05:46 GMT
- Title: PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
- Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohammadreza Samadi, Jiao He, Fengyu Sun, Di Niu,
- Abstract summary: PixelMan is an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation.
We show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods.
- Score: 15.342060815068347
- License:
- Abstract: Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.
Related papers
- AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [14.543341303789445]
We propose a novel mask-free point-based image editing method, AdaptiveDrag, which generates images that better align with user intent.
To ensure a comprehensive connection between the input image and the drag process, we have developed a semantic-driven optimization.
Building on these effective designs, our method delivers superior generation results using only the single input image and the handle-target point pairs.
arXiv Detail & Related papers (2024-10-16T15:59:02Z) - Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner [28.939227214483953]
This paper employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process.
We show that our method achieves state-of-the-art (SOTA) inference speed and image editing performance at the pixel-level granularity.
arXiv Detail & Related papers (2024-07-26T10:45:57Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image
Editing [8.19063619210761]
PFB-Diff is a Progressive Feature Blending method for Diffusion-based image editing.
Our method demonstrates its superior performance in terms of image fidelity, editing accuracy, efficiency, and faithfulness to the original image.
arXiv Detail & Related papers (2023-06-28T11:10:20Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - IDA: Improved Data Augmentation Applied to Salient Object Detection [0.0]
We present an Improved Data Augmentation (IDA) technique focused on Salient Object Detection (SOD)
Our method combines image inpainting, affine transformations, and the linear combination of different generated background images with salient objects extracted from labeled data.
We show that our method improves the segmentation quality when used for training state-of-the-art neural networks on several famous datasets of the SOD field.
arXiv Detail & Related papers (2020-09-18T14:03:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.