Interactive Segmentation for Diverse Gesture Types Without Context
- URL: http://arxiv.org/abs/2307.10518v2
- Date: Tue, 5 Dec 2023 17:56:26 GMT
- Title: Interactive Segmentation for Diverse Gesture Types Without Context
- Authors: Josh Myers-Dean, Yifei Fan, Brian Price, Wilson Chan, Danna Gurari
- Abstract summary: We propose a simplified interactive segmentation task where a user only must mark an image.
The input can be of any gesture type without specifying the gesture type.
We analyze numerous interactive segmentation algorithms, including ones adapted for our novel task.
- Score: 19.29886866117842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive segmentation entails a human marking an image to guide how a
model either creates or edits a segmentation. Our work addresses limitations of
existing methods: they either only support one gesture type for marking an
image (e.g., either clicks or scribbles) or require knowledge of the gesture
type being employed, and require specifying whether marked regions should be
included versus excluded in the final segmentation. We instead propose a
simplified interactive segmentation task where a user only must mark an image,
where the input can be of any gesture type without specifying the gesture type.
We support this new task by introducing the first interactive segmentation
dataset with multiple gesture types as well as a new evaluation metric capable
of holistically evaluating interactive segmentation algorithms. We then analyze
numerous interactive segmentation algorithms, including ones adapted for our
novel task. While we observe promising performance overall, we also highlight
areas for future improvement. To facilitate further extensions of this work, we
publicly share our new dataset at https://github.com/joshmyersdean/dig.
Related papers
- IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence [2.822194296769473]
Few-shot segmentation techniques reduce the required number of images to learn to segment a new class.
interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time.
We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes.
arXiv Detail & Related papers (2024-03-22T10:15:53Z) - Text and Click inputs for unambiguous open vocabulary instance
segmentation [21.03169732771627]
We propose a new segmentation process, Text + Click, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment.
We demonstrate that the combination of a single user-specified foreground click and a text prompt allows a model to better disambiguate overlapping or co-occurring semantic categories.
arXiv Detail & Related papers (2023-11-24T19:37:57Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Rethinking Interactive Image Segmentation: Feature Space Annotation [68.8204255655161]
We propose interactive and simultaneous segment annotation from multiple images guided by feature space projection.
We show that our approach can surpass the accuracy of state-of-the-art methods in foreground segmentation datasets.
arXiv Detail & Related papers (2021-01-12T10:13:35Z) - Multi-Stage Fusion for One-Click Segmentation [20.00726292545008]
We propose a new multi-stage guidance framework for interactive segmentation.
Our proposed framework has a negligible increase in parameter count compared to early-fusion frameworks.
arXiv Detail & Related papers (2020-10-19T17:07:40Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.