DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer
- URL: http://arxiv.org/abs/2304.06668v2
- Date: Tue, 22 Aug 2023 12:53:56 GMT
- Title: DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer
- Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, and Bastian
Leibe
- Abstract summary: Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
- Score: 58.95404214273222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most state-of-the-art instance segmentation methods rely on large amounts of
pixel-precise ground-truth annotations for training, which are expensive to
create. Interactive segmentation networks help generate such annotations based
on an image and the corresponding user interactions such as clicks. Existing
methods for this task can only process a single instance at a time and each
user interaction requires a full forward pass through the entire deep network.
We introduce a more efficient approach, called DynaMITe, in which we represent
user interactions as spatio-temporal queries to a Transformer decoder with a
potential to segment multiple object instances in a single iteration. Our
architecture also alleviates any need to re-compute image features during
refinement, and requires fewer interactions for segmenting multiple instances
in a single image when compared to other methods. DynaMITe achieves
state-of-the-art results on multiple existing interactive segmentation
benchmarks, and also on the new multi-instance benchmark that we propose in
this paper.
Related papers
- A Simple Image Segmentation Framework via In-Context Examples [59.319920526160466]
We present SINE, a simple image framework utilizing in-context examples.
We introduce an In-context Interaction module to complement in-context information and produce correlations between the target image and the in-context example.
Experiments on various segmentation tasks show the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-10-07T08:59:05Z) - Learning from Exemplars for Interactive Image Segmentation [15.37506525730218]
We introduce novel interactive segmentation frameworks for both a single object and multiple objects in the same category.
Our model reduces users' labor by around 15%, requiring two fewer clicks to achieve target IoUs 85% and 90%.
arXiv Detail & Related papers (2024-06-17T12:38:01Z) - IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence [2.822194296769473]
Few-shot segmentation techniques reduce the required number of images to learn to segment a new class.
interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time.
We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes.
arXiv Detail & Related papers (2024-03-22T10:15:53Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - InterFormer: Real-time Interactive Image Segmentation [80.45763765116175]
Interactive image segmentation enables annotators to efficiently perform pixel-level annotation for segmentation tasks.
The existing interactive segmentation pipeline suffers from inefficient computations of interactive models.
We propose a method named InterFormer that follows a new pipeline to address these issues.
arXiv Detail & Related papers (2023-04-06T08:57:00Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Multi-Stage Fusion for One-Click Segmentation [20.00726292545008]
We propose a new multi-stage guidance framework for interactive segmentation.
Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks.
arXiv Detail & Related papers (2020-10-19T17:07:40Z) - Localized Interactive Instance Segmentation [24.55415554455844]
We propose a clicking scheme wherein user interactions are restricted to the proximity of the object.
We demonstrate the effectiveness of our proposed clicking scheme and localization strategy through detailed experimentation.
arXiv Detail & Related papers (2020-10-18T23:24:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.