FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
- URL: http://arxiv.org/abs/2303.17225v1
- Date: Thu, 30 Mar 2023 08:42:49 GMT
- Title: FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
- Authors: Jie Qin, Jie Wu, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao,
Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang
- Abstract summary: FreeSeg is a generic framework to accomplish Unified, Universal and Open-Vocabulary Image.
We show that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks.
- Score: 42.89720785573885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, open-vocabulary learning has emerged to accomplish segmentation for
arbitrary categories of text-based descriptions, which popularizes the
segmentation system to more general-purpose application scenarios. However,
existing methods devote to designing specialized architectures or parameters
for specific segmentation tasks. These customized design paradigms lead to
fragmentation between various segmentation tasks, thus hindering the uniformity
of segmentation models. Hence in this paper, we propose FreeSeg, a generic
framework to accomplish Unified, Universal and Open-Vocabulary Image
Segmentation. FreeSeg optimizes an all-in-one network via one-shot training and
employs the same architecture and parameters to handle diverse segmentation
tasks seamlessly in the inference procedure. Additionally, adaptive prompt
learning facilitates the unified model to capture task-aware and
category-sensitive concepts, improving model robustness in multi-task and
varied scenarios. Extensive experimental results demonstrate that FreeSeg
establishes new state-of-the-art results in performance and generalization on
three segmentation tasks, which outperforms the best task-specific
architectures by a large margin: 5.5% mIoU on semantic segmentation, 17.6% mAP
on instance segmentation, 20.1% PQ on panoptic segmentation for the unseen
class on COCO.
Related papers
- Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation [33.11010205890195]
The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories.
We introduce the Universal Segment Embedding (USE) framework to address this challenge.
This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories.
arXiv Detail & Related papers (2024-06-07T21:41:18Z) - Universal Segmentation at Arbitrary Granularity with Language
Instruction [59.76130089644841]
We present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions.
For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output.
arXiv Detail & Related papers (2023-12-04T04:47:48Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z) - Integrative Few-Shot Learning for Classification and Segmentation [37.50821005917126]
We introduce the integrative task of few-shot classification and segmentation (FS-CS)
FS-CS aims to classify and segment target objects in a query image when the target classes are given with a few examples.
We propose the integrative few-shot learning framework for FS-CS, which trains a learner to construct class-wise foreground maps.
arXiv Detail & Related papers (2022-03-29T16:14:40Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.