InterFormer: Real-time Interactive Image Segmentation
- URL: http://arxiv.org/abs/2304.02942v2
- Date: Wed, 9 Aug 2023 08:41:39 GMT
- Title: InterFormer: Real-time Interactive Image Segmentation
- Authors: You Huang, Hao Yang, Ke Sun, Shengchuan Zhang, Liujuan Cao, Guannan
Jiang, Rongrong Ji
- Abstract summary: Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks.
The existing interactive segmentation pipeline suffers from inefficient computations of interactive models.
We propose a method named InterFormer that follows a new pipeline to address these issues.
- Score: 80.45763765116175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive image segmentation enables annotators to efficiently perform
pixel-level annotation for segmentation tasks. However, the existing
interactive segmentation pipeline suffers from inefficient computations of
interactive models because of the following two issues. First, annotators'
later click is based on models' feedback of annotators' former click. This
serial interaction is unable to utilize model's parallelism capabilities.
Second, in each interaction step, the model handles the invariant image along
with the sparse variable clicks, resulting in a process that's highly
repetitive and redundant. For efficient computations, we propose a method named
InterFormer that follows a new pipeline to address these issues. InterFormer
extracts and preprocesses the computationally time-consuming part i.e. image
processing from the existing process. Specifically, InterFormer employs a large
vision transformer (ViT) on high-performance devices to preprocess images in
parallel, and then uses a lightweight module called interactive multi-head self
attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module's
deployment on low-power devices extends the practical application of
interactive segmentation. The I-MSA module utilizes the preprocessed features
to efficiently response to the annotator inputs in real-time. The experiments
on several datasets demonstrate the effectiveness of InterFormer, which
outperforms previous interactive segmentation models in terms of computational
efficiency and segmentation quality, achieve real-time high-quality interactive
segmentation on CPU-only devices. The code is available at
https://github.com/YouHuang67/InterFormer.
Related papers
- Learning from Exemplars for Interactive Image Segmentation [15.37506525730218]
We introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category.
Our model reduces users' labor by around 15%, requiring two fewer clicks to achieve target IoUs 85% and 90%.
arXiv Detail & Related papers (2024-06-17T12:38:01Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - FocSAM: Delving Deeply into Focused Objects in Segmenting Anything [58.042354516491024]
The Segment Anything Model (SAM) marks a notable milestone in segmentation models.
We propose FocSAM with a pipeline redesigned on two pivotal aspects.
First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.
Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks.
arXiv Detail & Related papers (2024-05-29T02:34:13Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Modular Interactive Video Object Segmentation: Interaction-to-Mask,
Propagation and Difference-Aware Fusion [68.45737688496654]
We present a modular interactive VOS framework which decouples interaction-to-mask and mask propagation.
We show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions.
arXiv Detail & Related papers (2021-03-14T14:39:08Z) - Reviving Iterative Training with Mask Guidance for Interactive
Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes.
We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps.
We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z) - Multi-Stage Fusion for One-Click Segmentation [20.00726292545008]
We propose a new multi-stage guidance framework for interactive segmentation.
Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks.
arXiv Detail & Related papers (2020-10-19T17:07:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.