Universal Segmentation at Arbitrary Granularity with Language
Instruction
- URL: http://arxiv.org/abs/2312.01623v3
- Date: Thu, 7 Dec 2023 10:55:03 GMT
- Title: Universal Segmentation at Arbitrary Granularity with Language
Instruction
- Authors: Yong Liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong
Tang
- Abstract summary: We present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions.
For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output.
- Score: 59.76130089644841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to achieve universal segmentation of arbitrary semantic
level. Despite significant progress in recent years, specialist segmentation
approaches are limited to specific tasks and data distribution. Retraining a
new model for adaptation to new scenarios or settings takes expensive
computation and time cost, which raises the demand for versatile and universal
segmentation model that can cater to various granularity. Although some
attempts have been made for unifying different segmentation tasks or
generalization to various scenarios, limitations in the definition of paradigms
and input-output spaces make it difficult for them to achieve accurate
understanding of content at arbitrary granularity. To this end, we present
UniLSeg, a universal segmentation model that can perform segmentation at any
semantic level with the guidance of language instructions. For training
UniLSeg, we reorganize a group of tasks from original diverse distributions
into a unified data format, where images with texts describing segmentation
targets as input and corresponding masks are output. Combined with a automatic
annotation engine for utilizing numerous unlabeled data, UniLSeg achieves
excellent performance on various tasks and settings, surpassing both specialist
and unified segmentation models.
Related papers
- Training-Free Semantic Segmentation via LLM-Supervision [37.9007813884699]
This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM)
Our method starts from an LLM to generate a detailed set of subclasses for more accurate class representation.
We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels.
arXiv Detail & Related papers (2024-03-31T14:37:25Z) - Task-Specific Adaptation of Segmentation Foundation Model via Prompt Learning [7.6136466242670435]
We propose a task-specific adaptation of the segmentation foundation model via prompt learning tailored to the Segment Anything Model (SAM)
Our method involves a prompt learning module which adjusts input prompts into the embedding space to better align with peculiarities of the target task.
Experimental results on various customized segmentation scenarios demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-03-14T09:13:51Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z) - FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation [42.89720785573885]
FreeSeg is a generic framework to accomplish Unified, Universal and Open-Vocabulary Image.
We show that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks.
arXiv Detail & Related papers (2023-03-30T08:42:49Z) - BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment [46.51685959045527]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space.
Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
arXiv Detail & Related papers (2020-12-28T16:02:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.