CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
- URL: http://arxiv.org/abs/2406.09368v1
- Date: Thu, 13 Jun 2024 17:50:28 GMT
- Title: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
- Authors: Yigit Ekin, Ahmet Burak Yildirim, Erdem Eren Caglar, Aykut Erdem, Erkut Erdem, Aysegul Dundar,
- Abstract summary: CLIPAway is a novel approach leveraging CLIP embeddings to focus on background regions while excluding foreground elements.
It enhances inpainting accuracy and quality by identifying embeddings that prioritize the background.
Unlike other methods that rely on specialized training datasets or costly manual annotations, CLIPAway provides a flexible, plug-and-play solution.
- Score: 16.58831310165623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advanced image editing techniques, particularly inpainting, are essential for seamlessly removing unwanted elements while preserving visual integrity. Traditional GAN-based methods have achieved notable success, but recent advancements in diffusion models have produced superior results due to their training on large-scale datasets, enabling the generation of remarkably realistic inpainted images. Despite their strengths, diffusion models often struggle with object removal tasks without explicit guidance, leading to unintended hallucinations of the removed object. To address this issue, we introduce CLIPAway, a novel approach leveraging CLIP embeddings to focus on background regions while excluding foreground elements. CLIPAway enhances inpainting accuracy and quality by identifying embeddings that prioritize the background, thus achieving seamless object removal. Unlike other methods that rely on specialized training datasets or costly manual annotations, CLIPAway provides a flexible, plug-and-play solution compatible with various diffusion-based inpainting techniques.
Related papers
- OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting [54.525583840585305]
We introduce OmniPaint, a unified framework that re-conceptualizes object removal and insertion as interdependent processes.
Our novel CFD metric offers a robust, reference-free evaluation of context consistency and object hallucination.
arXiv Detail & Related papers (2025-03-11T17:55:27Z) - Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways [13.08168394252538]
Erase inpainting aims to precisely remove target objects within masked regions while preserving the overall consistency of the surrounding content.
We propose a novel Erase Diffusion, termed EraDiff, aimed at unleashing the potential power of standard diffusion in the context of object removal.
Our proposed EraDiff achieves state-of-the-art performance on the OpenImages V5 dataset and demonstrates significant superiority in real-world scenarios.
arXiv Detail & Related papers (2025-03-10T08:06:51Z) - Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance [6.249195110289606]
Attentive Eraser is a tuning-free method to empower pre-trained diffusion models for stable and effective object removal.
We introduce Attention Activation and Suppression (ASS), which re-engineers the self-attention mechanism.
We also introduce Self-Attention Redirection Guidance (SARG), which utilizes the self-attention redirected by ASS to guide the generation process.
arXiv Detail & Related papers (2024-12-17T14:56:59Z) - ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring [61.82010103478833]
We develop a context-based local blur detection module that incorporates additional contextual information to improve the identification of blurry regions.
Considering that modern smartphones are equipped with cameras capable of providing short-exposure images, we develop a blur-aware guided image restoration method.
We formulate the above components into a simple yet effective network, named ExpRDiff.
arXiv Detail & Related papers (2024-12-12T11:42:39Z) - EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM [50.054404519821745]
We present a novel framework that integrates a multimodal Large Language Model for enhanced reasoning capabilities.
Our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush datasets.
Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits.
arXiv Detail & Related papers (2024-12-05T02:05:33Z) - MagicEraser: Erasing Any Objects via Semantics-Aware Control [40.683569840182926]
We introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task.
MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts.
arXiv Detail & Related papers (2024-10-14T07:03:14Z) - Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping.
We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping.
Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z) - TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization [59.412236435627094]
TALE is a training-free framework harnessing the generative capabilities of text-to-image diffusion models.
We equip TALE with two mechanisms dubbed Adaptive Latent Manipulation and Energy-guided Latent Optimization.
Our experiments demonstrate that TALE surpasses prior baselines and attains state-of-the-art performance in image-guided composition.
arXiv Detail & Related papers (2024-08-07T08:52:21Z) - InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture [0.0]
InsertDiffusion is a training-free diffusion architecture that efficiently embeds objects into images.
Our approach utilizes off-the-shelf generative models and eliminates the need for fine-tuning.
By decomposing the generation task into independent steps, InsertDiffusion offers a scalable solution.
arXiv Detail & Related papers (2024-07-15T10:15:58Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Latent Feature-Guided Diffusion Models for Shadow Removal [50.02857194218859]
We propose the use of diffusion models as they offer a promising approach to gradually refine the details of shadow regions during the diffusion process.
Our method improves this process by conditioning on a learned latent feature space that inherits the characteristics of shadow-free images.
We demonstrate the effectiveness of our approach which outperforms the previous best method by 13% in terms of RMSE on the AISTD dataset.
arXiv Detail & Related papers (2023-12-04T18:59:55Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Take a Prior from Other Tasks for Severe Blur Removal [52.380201909782684]
Cross-level feature learning strategy based on knowledge distillation to learn the priors.
Semantic prior embedding layer with multi-level aggregation and semantic attention transformation to integrate the priors effectively.
Experiments on natural image deblurring benchmarks and real-world images, such as GoPro and RealBlur datasets, demonstrate our method's effectiveness and ability.
arXiv Detail & Related papers (2023-02-14T08:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.