Text and Click inputs for unambiguous open vocabulary instance
segmentation
- URL: http://arxiv.org/abs/2311.14822v1
- Date: Fri, 24 Nov 2023 19:37:57 GMT
- Title: Text and Click inputs for unambiguous open vocabulary instance
segmentation
- Authors: Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh
Birodkar
- Abstract summary: We propose a new segmentation process, Text + Click, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment.
We demonstrate that the combination of a single user-specified foreground click and a text prompt allows a model to better disambiguate overlapping or co-occurring semantic categories.
- Score: 21.03169732771627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmentation localizes objects in an image on a fine-grained per-pixel scale.
Segmentation benefits by humans-in-the-loop to provide additional input of
objects to segment using a combination of foreground or background clicks.
Tasks include photoediting or novel dataset annotation, where human annotators
leverage an existing segmentation model instead of drawing raw pixel level
annotations. We propose a new segmentation process, Text + Click segmentation,
where a model takes as input an image, a text phrase describing a class to
segment, and a single foreground click specifying the instance to segment.
Compared to previous approaches, we leverage open-vocabulary image-text models
to support a wide-range of text prompts. Conditioning segmentations on text
prompts improves the accuracy of segmentations on novel or unseen classes. We
demonstrate that the combination of a single user-specified foreground click
and a text prompt allows a model to better disambiguate overlapping or
co-occurring semantic categories, such as "tie", "suit", and "person". We study
these results across common segmentation datasets such as refCOCO, COCO, VOC,
and OpenImages. Source code available here.
Related papers
- USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation [33.11010205890195]
The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories.
We introduce the Universal Segment Embedding (USE) framework to address this challenge.
This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories.
arXiv Detail & Related papers (2024-06-07T21:41:18Z) - IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence [2.822194296769473]
Few-shot segmentation techniques reduce the required number of images to learn to segment a new class.
interactive segmentation techniques only focus on incrementally improving the segmentation of one object at a time.
We combine the two concepts to drastically reduce the effort required to train segmentation models for novel classes.
arXiv Detail & Related papers (2024-03-22T10:15:53Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Shatter and Gather: Learning Referring Image Segmentation with Text
Supervision [52.46081425504072]
We present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent.
Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.
arXiv Detail & Related papers (2023-08-29T15:39:15Z) - Synthetic Instance Segmentation from Semantic Image Segmentation Masks [15.477053085267404]
We propose a novel paradigm called Synthetic Instance (SISeg)
SISeg instance segmentation results by leveraging image masks generated by existing semantic segmentation models.
In other words, the proposed model does not need extra manpower or higher computational expenses.
arXiv Detail & Related papers (2023-08-02T05:13:02Z) - Interactive Segmentation for Diverse Gesture Types Without Context [19.29886866117842]
We propose a simplified interactive segmentation task where a user only must mark an image.
The input can be of any gesture type without specifying the gesture type.
We analyze numerous interactive segmentation algorithms, including ones adapted for our novel task.
arXiv Detail & Related papers (2023-07-20T01:37:32Z) - Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation.
It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z) - Learning to Generate Text-grounded Mask for Open-world Semantic
Segmentation from Only Image-Text Pairs [10.484851004093919]
We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images.
Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts.
We propose a novel Text-grounded Contrastive Learning framework that enables a model to directly learn region-text alignment.
arXiv Detail & Related papers (2022-12-01T18:59:03Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Language-driven Semantic Segmentation [88.21498323896475]
We present LSeg, a novel model for language-driven semantic image segmentation.
We use a text encoder to compute embeddings of descriptive input labels.
The encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class.
arXiv Detail & Related papers (2022-01-10T18:59:10Z) - Prompt-Based Multi-Modal Image Segmentation [81.58378196535003]
We propose a system that can generate image segmentations based on arbitrary prompts at test time.
A prompt can be either a text or an image.
We build upon the CLIP model as a backbone which we extend with a transformer-based decoder.
arXiv Detail & Related papers (2021-12-18T21:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.