AIMS: All-Inclusive Multi-Level Segmentation
        - URL: http://arxiv.org/abs/2305.17768v1
- Date: Sun, 28 May 2023 16:28:49 GMT
- Title: AIMS: All-Inclusive Multi-Level Segmentation
- Authors: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu,
  Ming-Hsuan Yang
- Abstract summary: We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
- Score: 93.5041381700744
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract:   Despite the progress of image segmentation for accurate visual entity
segmentation, completing the diverse requirements of image editing applications
for different-level region-of-interest selections remains unsolved. In this
paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS),
which segments visual regions into three levels: part, entity, and relation
(two entities with some semantic relationships). We also build a unified AIMS
model through multi-dataset multi-task training to address the two major
challenges of annotation inconsistency and task correlation. Specifically, we
propose task complementarity, association, and prompt mask encoder for
three-level predictions. Extensive experiments demonstrate the effectiveness
and generalization capacity of our method compared to other state-of-the-art
methods on a single dataset or the concurrent work on segmenting anything. We
will make our code and training model publicly available.
 
      
        Related papers
        - X-SAM: From Segment Anything to Any Segmentation [63.79182974315084]
 Large Language Models (LLMs) demonstrate strong capabilities in broad knowledge representation, yet they are inherently deficient in pixel-level perceptual understanding.<n>We present X-SAM, a streamlined Multimodal Large Language Model framework that extends the segmentation paradigm from textitsegment anything to textitany segmentation.<n>We propose a new segmentation task, termed Visual GrounDed (VGD) segmentation, which segments all instance objects with interactive visual prompts and empowers MLLMs with visual grounded, pixel-wise interpretative capabilities.
 arXiv  Detail & Related papers  (2025-08-06T17:19:10Z)
- One-shot In-context Part Segmentation [97.77292483684877]
 We present the One-shot In-context Part (OIParts) framework to tackle the challenges of part segmentation.
Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient.
We have achieved remarkable segmentation performance across diverse object categories.
 arXiv  Detail & Related papers  (2025-03-03T03:50:54Z)
- Frequency-based Matcher for Long-tailed Semantic Segmentation [22.199174076366003]
 We focus on a relatively under-explored task setting, long-tailed semantic segmentation (LTSS)
We propose a dual-metric evaluation system and construct the LTSS benchmark to demonstrate the performance of semantic segmentation methods and long-tailed solutions.
We also propose a transformer-based algorithm to improve LTSS, frequency-based matcher, which solves the oversuppression problem by one-to-many matching.
 arXiv  Detail & Related papers  (2024-06-06T09:57:56Z)
- Point-In-Context: Understanding Point Cloud via In-Context Learning [67.20277182808992]
 We introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context learning.
We address the technical challenge of effectively extending masked point modeling to 3D point clouds by introducing a Joint Sampling module.
We propose two novel training strategies, In-Context Labeling and In-Context Enhancing, forming an extended version of PIC named Point-In-Context-Segmenter (PIC-S)
 arXiv  Detail & Related papers  (2024-04-18T17:32:32Z)
- OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
 OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
 arXiv  Detail & Related papers  (2024-01-18T18:59:34Z)
- Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
 We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
 arXiv  Detail & Related papers  (2023-10-31T20:15:40Z)
- Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
 We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
 arXiv  Detail & Related papers  (2023-07-10T17:59:40Z)
- Segment Everything Everywhere All at Once [124.90835636901096]
 We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
 arXiv  Detail & Related papers  (2023-04-13T17:59:40Z)
- FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation [42.89720785573885]
 FreeSeg is a generic framework to accomplish Unified, Universal and Open-Vocabulary Image.
We show that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks.
 arXiv  Detail & Related papers  (2023-03-30T08:42:49Z)
- OS-MSL: One Stage Multimodal Sequential Link Framework for Scene
  Segmentation and Classification [11.707994658605546]
 We propose a general One Stage Multimodal Sequential Link Framework (OS-MSL) to distinguish and leverage the two-fold semantics.
We tailor a specific module called DiffCorrNet to explicitly extract the information of differences and correlations among shots.
 arXiv  Detail & Related papers  (2022-07-04T07:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.