FocalClick: Towards Practical Interactive Image Segmentation
- URL: http://arxiv.org/abs/2204.02574v1
- Date: Wed, 6 Apr 2022 04:32:01 GMT
- Title: FocalClick: Towards Practical Interactive Image Segmentation
- Authors: Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang
Zhao
- Abstract summary: Interactive segmentation allows users to extract target masks by making positive/negative clicks.
F FocalClick solves both issues at once by predicting and updating the mask in localized areas.
Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively.
- Score: 19.472284443121367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive segmentation allows users to extract target masks by making
positive/negative clicks. Although explored by many previous works, there is
still a gap between academic approaches and industrial needs: first, existing
models are not efficient enough to work on low power devices; second, they
perform poorly when used to refine preexisting masks as they could not avoid
destroying the correct part. FocalClick solves both issues at once by
predicting and updating the mask in localized areas. For higher efficiency, we
decompose the slow prediction on the entire image into two fast inferences on
small crops: a coarse segmentation on the Target Crop, and a local refinement
on the Focus Crop. To make the model work with preexisting masks, we formulate
a sub-task termed Interactive Mask Correction, and propose Progressive Merge as
the solution. Progressive Merge exploits morphological information to decide
where to preserve and where to update, enabling users to refine any preexisting
mask effectively. FocalClick achieves competitive results against SOTA methods
with significantly smaller FLOPs. It also shows significant superiority when
making corrections on preexisting masks. Code and data will be released at
github.com/XavierCHEN34/ClickSEG
Related papers
- The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning [16.05598829701769]
CMT-MAE leverages a simple collaborative masking mechanism through linear aggregation across attentions from both teacher and student models.
Our framework pre-trained on ImageNet-1K achieves state-of-the-art linear probing and fine-tuning performance.
arXiv Detail & Related papers (2024-12-23T13:37:26Z) - High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation [109.19165503929992]
We present MaskCLIP++, which uses ground-truth masks instead of generated masks to enhance the mask classification capability of CLIP.
After low-cost fine-tuning, MaskCLIP++ significantly improves the mask classification performance on multi-domain datasets.
We achieve performance improvements of +1.7, +2.3, +2.1, +3.1, and +0.3 mIoU on the A-847, PC-459, A-150, PC-59, and PAS-20 datasets.
arXiv Detail & Related papers (2024-12-16T05:44:45Z) - Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation [42.020470627552136]
Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks.
mask classification is the main performance bottleneck for open-vocab panoptic segmentation.
We propose Semantic Refocused Tuning, a novel framework that greatly enhances open-vocab panoptic segmentation.
arXiv Detail & Related papers (2024-09-24T17:50:28Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - Unmasking Anomalies in Road-Scene Segmentation [18.253109627901566]
Anomaly segmentation is a critical task for driving applications.
We propose a paradigm change by shifting from a per-pixel classification to a mask classification.
Mask2Anomaly demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture.
arXiv Detail & Related papers (2023-07-25T08:23:10Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - MP-Former: Mask-Piloted Transformer for Image Segmentation [16.620469868310288]
Mask2Former suffers from inconsistent mask predictions between decoder layers.
We propose a mask-piloted training approach, which feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones.
arXiv Detail & Related papers (2023-03-13T17:57:59Z) - Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision.
We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency.
EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - Box-Adapt: Domain-Adaptive Medical Image Segmentation using Bounding
BoxSupervision [52.45336255472669]
We propose a weakly supervised do-main adaptation setting for deep learning.
Box-Adapt fully explores the fine-grained segmenta-tion mask in the source domain and the weak bounding box in the target domain.
We demonstrate the effectiveness of our method in the liver segmentation task.
arXiv Detail & Related papers (2021-08-19T01:51:04Z) - BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [103.74690082121079]
In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity.
Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer.
arXiv Detail & Related papers (2020-01-02T03:30:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.