Related papers: MAIS: Memory-Attention for Interactive Segmentation

MAIS: Memory-Attention for Interactive Segmentation

URL: http://arxiv.org/abs/2505.07511v1
Date: Mon, 12 May 2025 12:48:27 GMT
Title: MAIS: Memory-Attention for Interactive Segmentation
Authors: Mauricio Orbes-Arteaga, Oeslle Lucena, Sabastien Ourselin, M. Jorge Cardoso,
Abstract summary: Vision Transformer (ViT)-based models achieve state-of-the-art performance using user clicks and prior masks as prompts.<n>Existing methods treat interactions as independent events, leading to redundant corrections and limited refinement gains.<n>We address this by introducing Memory-Attention mechanism for Interactive that stores past user inputs and segmentation states, enabling temporal context integration.
Score: 0.8678845273264675
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interactive medical segmentation reduces annotation effort by refining predictions through user feedback. Vision Transformer (ViT)-based models, such as the Segment Anything Model (SAM), achieve state-of-the-art performance using user clicks and prior masks as prompts. However, existing methods treat interactions as independent events, leading to redundant corrections and limited refinement gains. We address this by introducing MAIS, a Memory-Attention mechanism for Interactive Segmentation that stores past user inputs and segmentation states, enabling temporal context integration. Our approach enhances ViT-based segmentation across diverse imaging modalities, achieving more efficient and accurate refinements.

Related papers

Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals. Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars. Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z)
PE-MED: Prompt Enhancement for Interactive Medical Image Segmentation [9.744164910887223]
We introduce a novel framework equipped with prompt enhancement, called PE-MED, for interactive medical image segmentation. First, we introduce a Self-Loop strategy to generate warm initial segmentation results based on the first prompt. Second, we propose a novel Prompt Attention Learning Module (PALM) to mine useful prompt information in one interaction.
arXiv Detail & Related papers (2023-08-26T03:11:48Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z)
InterFormer: Real-time Interactive Image Segmentation [80.45763765116175]
Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks. The existing interactive segmentation pipeline suffers from inefficient computations of interactive models. We propose a method named InterFormer that follows a new pipeline to address these issues.
arXiv Detail & Related papers (2023-04-06T08:57:00Z)
Temporal Segment Transformer for Action Segmentation [54.25103250496069]
We propose an attention based approach which we call textittemporal segment transformer, for joint segment relation modeling and denoising. The main idea is to denoise segment representations using attention between segment and frame representations, and also use inter-segment attention to capture temporal correlations between segments. We show that this novel architecture achieves state-of-the-art accuracy on the popular 50Salads, GTEA and Breakfast benchmarks.
arXiv Detail & Related papers (2023-02-25T13:05:57Z)
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion [68.45737688496654]
We present a modular interactive VOS framework which decouples interaction-to-mask and mask propagation. We show that our method outperforms current state-of-the-art algorithms while requiring fewer frame interactions.
arXiv Detail & Related papers (2021-03-14T14:39:08Z)
Reviving Iterative Training with Mask Guidance for Interactive Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z)
Multi-Stage Fusion for One-Click Segmentation [20.00726292545008]
We propose a new multi-stage guidance framework for interactive segmentation. Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks.
arXiv Detail & Related papers (2020-10-19T17:07:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.