Related papers: InterFormer: Real-time Interactive Image Segmentation

InterFormer: Real-time Interactive Image Segmentation

URL: http://arxiv.org/abs/2304.02942v2
Date: Wed, 9 Aug 2023 08:41:39 GMT
Title: InterFormer: Real-time Interactive Image Segmentation
Authors: You Huang, Hao Yang, Ke Sun, Shengchuan Zhang, Liujuan Cao, Guannan Jiang, Rongrong Ji
Abstract summary: Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. The existing interactive segmentation pipeline suffers from inefficient computations of interactive models. We propose a method named InterFormer that follows a new pipeline to address these issues.
Score: 80.45763765116175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. However, the existing interactive segmentation pipeline suffers from inefficient computations of interactive models because of the following two issues. First, annotators' later click is based on models' feedback of annotators' former click. This serial interaction is unable to utilize model's parallelism capabilities. Second, in each interaction step, the model handles the invariant image along with the sparse variable clicks, resulting in a process that's highly repetitive and redundant. For efficient computations, we propose a method named InterFormer that follows a new pipeline to address these issues. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. Furthermore, the I-MSA module's deployment on low-power devices extends the practical application of interactive segmentation. The I-MSA module utilizes the preprocessed features to efficiently response to the annotator inputs in real-time. The experiments on several datasets demonstrate the effectiveness of InterFormer, which outperforms previous interactive segmentation models in terms of computational efficiency and segmentation quality, achieve real-time high-quality interactive segmentation on CPU-only devices. The code is available at https://github.com/YouHuang67/InterFormer.

Related papers

Learning from Exemplars for Interactive Image Segmentation [15.37506525730218]
We introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category. Our model reduces users' labor by around 15%, requiring two fewer clicks to achieve target IoUs 85% and 90%.
arXiv Detail & Related papers (2024-06-17T12:38:01Z)
Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT) We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information. Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z)
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything [58.042354516491024]
The Segment Anything Model (SAM) marks a notable milestone in segmentation models. We propose FocSAM with a pipeline redesigned on two pivotal aspects. First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object. Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks.
arXiv Detail & Related papers (2024-05-29T02:34:13Z)
OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs. We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z)
DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z)
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion [68.45737688496654]
We present a modular interactive VOS framework which decouples interaction-to-mask and mask propagation. We show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions.
arXiv Detail & Related papers (2021-03-14T14:39:08Z)
Reviving Iterative Training with Mask Guidance for Interactive Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z)
Multi-Stage Fusion for One-Click Segmentation [20.00726292545008]
We propose a new multi-stage guidance framework for interactive segmentation. Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks.
arXiv Detail & Related papers (2020-10-19T17:07:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.