AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for
Text-Based Continuity-Sensitive Image Editing
- URL: http://arxiv.org/abs/2312.08019v2
- Date: Sun, 24 Dec 2023 05:55:36 GMT
- Title: AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for
Text-Based Continuity-Sensitive Image Editing
- Authors: Zhiyuan Ma, Guoli Jia, Bowen Zhou
- Abstract summary: We propose atemporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing.
Our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning extra data, or optimization.
We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
- Score: 24.9487669818162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the great success of text-conditioned diffusion models in creative
text-to-image generation, various text-driven image editing approaches have
attracted the attentions of many researchers. However, previous works mainly
focus on discreteness-sensitive instructions such as adding, removing or
replacing specific objects, background elements or global styles (i.e., hard
editing), while generally ignoring subject-binding but semantically
fine-changing continuity-sensitive instructions such as actions, poses or
adjectives, and so on (i.e., soft editing), which hampers generative AI from
generating user-customized visual contents. To mitigate this predicament, we
propose a spatio-temporal guided adaptive editing algorithm AdapEdit, which
realizes adaptive image editing by introducing a soft-attention strategy to
dynamically vary the guiding degree from the editing conditions to visual
pixels from both temporal and spatial perspectives. Note our approach has a
significant advantage in preserving model priors and does not require model
training, fine-tuning, extra data, or optimization. We present our results over
a wide variety of raw images and editing instructions, demonstrating
competitive performance and showing it significantly outperforms the previous
approaches.
Related papers
- Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing [67.96788532285649]
We present a Vision-guided and Mask-enhanced Adaptive Editing (ViMAEdit) method with three key novel designs.
First, we propose to leverage image embeddings as explicit guidance to enhance the conventional textual prompt-based denoising process.
Second, we devise a self-attention-guided iterative editing area grounding strategy.
arXiv Detail & Related papers (2024-10-14T13:41:37Z) - Text Guided Image Editing with Automatic Concept Locating and Forgetting [27.70615803908037]
We propose a novel method called Locate and Forget (LaF) to locate potential target concepts in the image for modification.
Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-05-30T05:36:32Z) - InstructBrush: Learning Attention-based Instruction Optimization for Image Editing [54.07526261513434]
InstructBrush is an inversion method for instruction-based image editing methods.
It extracts editing effects from image pairs as editing instructions, which are further applied for image editing.
Our approach achieves superior performance in editing and is more semantically consistent with the target editing effects.
arXiv Detail & Related papers (2024-03-27T15:03:38Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - InstructGIE: Towards Generalizable Image Editing [34.83188723673297]
We introduce a novel image editing framework with enhanced generalization robustness.
This framework incorporates a module specifically optimized for image editing tasks, leveraging the VMamba Block.
We also unveil a selective area-matching technique specifically engineered to address and rectify corrupted details in generated images.
arXiv Detail & Related papers (2024-03-08T03:43:04Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [86.92711729969488]
We exploit the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions, and unexpected changes in nonselected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.