Click2Mask: Local Editing with Dynamic Mask Generation
- URL: http://arxiv.org/abs/2409.08272v1
- Date: Thu, 12 Sep 2024 17:59:04 GMT
- Title: Click2Mask: Local Editing with Dynamic Mask Generation
- Authors: Omer Regev, Omri Avrahami, Dani Lischinski,
- Abstract summary: Click2Mask is a novel approach that simplifies the local editing process by requiring only a single point of reference.
Our experiments demonstrate that Click2Mask not only minimizes user effort but also delivers competitive or superior local image manipulation results.
- Score: 23.89536337989824
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in generative models have revolutionized image generation and editing, making these tasks accessible to non-experts. This paper focuses on local image editing, particularly the task of adding new content to a loosely specified area. Existing methods often require a precise mask or a detailed description of the location, which can be cumbersome and prone to errors. We propose Click2Mask, a novel approach that simplifies the local editing process by requiring only a single point of reference (in addition to the content description). A mask is dynamically grown around this point during a Blended Latent Diffusion (BLD) process, guided by a masked CLIP-based semantic loss. Click2Mask surpasses the limitations of segmentation-based and fine-tuning dependent methods, offering a more user-friendly and contextually accurate solution. Our experiments demonstrate that Click2Mask not only minimizes user effort but also delivers competitive or superior local image manipulation results compared to SoTA methods, according to both human judgement and automatic metrics. Key contributions include the simplification of user input, the ability to freely add objects unconstrained by existing segments, and the integration potential of our dynamic mask approach within other editing methods.
Related papers
- SmartEraser: Remove Anything from Images using Masked-Region Guidance [114.36809682798784]
SmartEraser is built with a new removing paradigm called Masked-Region Guidance.
Masked-Region Guidance retains the masked region in the input, using it as guidance for the removal process.
We present Syn4Removal, a large-scale object removal dataset.
arXiv Detail & Related papers (2025-01-14T17:55:12Z) - High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation [109.19165503929992]
We present MaskCLIP++, which uses ground-truth masks instead of generated masks to enhance the mask classification capability of CLIP.
After low-cost fine-tuning, MaskCLIP++ significantly improves the mask classification performance on multi-domain datasets.
We achieve performance improvements of +1.7, +2.3, +2.1, +3.1, and +0.3 mIoU on the A-847, PC-459, A-150, PC-59, and PAS-20 datasets.
arXiv Detail & Related papers (2024-12-16T05:44:45Z) - BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.
We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.
Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z) - FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing [25.18320863976491]
We propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editing.
Our method achieves state-of-the-art (SOTA) performance in LLM-based image editing, and our simple prompting technique stands out in its effectiveness.
arXiv Detail & Related papers (2024-08-22T14:22:07Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Lazy Diffusion Transformer for Interactive Image Editing [79.75128130739598]
We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently.
Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications.
arXiv Detail & Related papers (2024-04-18T17:59:27Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - Automatic Generation of Semantic Parts for Face Image Synthesis [7.728916126705043]
We describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks.
Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited.
We report quantitative and qualitative results on the Celeb-MaskHQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level.
arXiv Detail & Related papers (2023-07-11T15:01:42Z) - Mask to reconstruct: Cooperative Semantics Completion for Video-text
Retrieval [19.61947785487129]
Mask for Semantics Completion (MASCOT) based on semantic-based masked modeling.
Our MASCOT performs state-of-the-art performance on four major text-video retrieval benchmarks.
arXiv Detail & Related papers (2023-05-13T12:31:37Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - Layered Depth Refinement with Mask Guidance [61.10654666344419]
We formulate a novel problem of mask-guided depth refinement that utilizes a generic mask to refine the depth prediction of SIDE models.
Our framework performs layered refinement and inpainting/outpainting, decomposing the depth map into two separate layers signified by the mask and the inverse mask.
We empirically show that our method is robust to different types of masks and initial depth predictions, accurately refining depth values in inner and outer mask boundary regions.
arXiv Detail & Related papers (2022-06-07T06:42:44Z) - FocalClick: Towards Practical Interactive Image Segmentation [19.472284443121367]
Interactive segmentation allows users to extract target masks by making positive/negative clicks.
F FocalClick solves both issues at once by predicting and updating the mask in localized areas.
Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively.
arXiv Detail & Related papers (2022-04-06T04:32:01Z) - Open-Vocabulary Instance Segmentation via Robust Cross-Modal
Pseudo-Labeling [61.03262873980619]
Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations.
We propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.
Our framework is capable of labeling novel classes in captions via their word semantics to self-train a student model.
arXiv Detail & Related papers (2021-11-24T18:50:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.