Split Matching for Inductive Zero-shot Semantic Segmentation
- URL: http://arxiv.org/abs/2505.05023v2
- Date: Fri, 27 Jun 2025 09:35:34 GMT
- Title: Split Matching for Inductive Zero-shot Semantic Segmentation
- Authors: Jialei Chen, Xu Zheng, Dongyue Li, Chong Yi, Seigo Ito, Danda Pani Paudel, Luc Van Gool, Hiroshi Murase, Daisuke Deguchi,
- Abstract summary: Zero-shot Semantic (ZSS) aims to segment categories that are not annotated during training.<n>We propose Split Matching (SM), a novel assignment strategy that decouples Hungarian matching into two components.<n>SM is the first to introduce decoupled Hungarian matching under the inductive ZSS setting, and achieves state-of-the-art performance on two standard benchmarks.
- Score: 52.90218623214213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot Semantic Segmentation (ZSS) aims to segment categories that are not annotated during training. While fine-tuning vision-language models has achieved promising results, these models often overfit to seen categories due to the lack of supervision for unseen classes. As an alternative to fully supervised approaches, query-based segmentation has shown great latent in ZSS, as it enables object localization without relying on explicit labels. However, conventional Hungarian matching, a core component in query-based frameworks, needs full supervision and often misclassifies unseen categories as background in the setting of ZSS. To address this issue, we propose Split Matching (SM), a novel assignment strategy that decouples Hungarian matching into two components: one for seen classes in annotated regions and another for latent classes in unannotated regions (referred to as unseen candidates). Specifically, we partition the queries into seen and candidate groups, enabling each to be optimized independently according to its available supervision. To discover unseen candidates, we cluster CLIP dense features to generate pseudo masks and extract region-level embeddings using CLS tokens. Matching is then conducted separately for the two groups based on both class-level similarity and mask-level consistency. Additionally, we introduce a Multi-scale Feature Enhancement (MFE) module that refines decoder features through residual multi-scale aggregation, improving the model's ability to capture spatial details across resolutions. SM is the first to introduce decoupled Hungarian matching under the inductive ZSS setting, and achieves state-of-the-art performance on two standard benchmarks.
Related papers
- LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance [56.474856189865946]
Large multi-modal models (LMMs) struggle with inaccurate segmentation and hallucinated comprehension.<n>We propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation.<n>LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks.
arXiv Detail & Related papers (2025-07-08T07:46:26Z) - Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation [55.486872677160015]
We propose Chimera-Seg, which integrates a segmentation backbone as the body and a CLIP-based semantic head as the head.<n>Specifically, Chimera-Seg comprises a trainable segmentation model and a CLIP Semantic Head (CSH), which maps dense features into the CLIP-aligned space.<n>We also propose Selective Global Distillation (SGD), which distills knowledge from dense features exhibiting high similarity to the CLIP CLS token.
arXiv Detail & Related papers (2025-06-27T09:26:50Z) - Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs [34.06878654462158]
We introduce Event-Centric GCD, characterized by long, complex narratives and highly imbalanced class distributions.<n>We propose PaMA, a framework leveraging LLMs to extract and refine event patterns for improved cluster-class alignment.<n> Evaluations on two EC-GCD benchmarks, including a newly constructed Scam Report dataset, demonstrate that PaMA outperforms prior methods with up to 12.58% H-score gains.
arXiv Detail & Related papers (2025-05-29T10:02:04Z) - Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.<n>The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.<n>Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z) - Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation [39.7657197805346]
Point cloud few-shot semantic segmentation (PC-FSS) aims to segment targets of novel categories in a given query point cloud with only a few annotated support samples.
We propose a simple yet effective framework in the spirit of Decoupled Localization and Expansion (DLE)
DLE, including a structural localization module (SLM) and a self-expansion module (SEM), enjoys several merits.
arXiv Detail & Related papers (2024-08-25T07:34:32Z) - Anchor-based Multi-view Subspace Clustering with Hierarchical Feature Descent [46.86939432189035]
We propose Anchor-based Multi-view Subspace Clustering with Hierarchical Feature Descent.
Our proposed model consistently outperforms the state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-11T03:29:13Z) - Integrative Few-Shot Learning for Classification and Segmentation [37.50821005917126]
We introduce the integrative task of few-shot classification and segmentation (FS-CS)
FS-CS aims to classify and segment target objects in a query image when the target classes are given with a few examples.
We propose the integrative few-shot learning framework for FS-CS, which trains a learner to construct class-wise foreground maps.
arXiv Detail & Related papers (2022-03-29T16:14:40Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Zero-Shot Semantic Segmentation via Spatial and Multi-Scale Aware Visual
Class Embedding [0.0]
We propose a language-model-free zero-shot semantic segmentation framework, Spatial and Multi-scale aware Visual Class Embedding Network (SM-VCENet)
In experiments, our SM-VCENet outperforms zero-shot semantic segmentation state-of-the-art by a relative margin.
arXiv Detail & Related papers (2021-11-30T07:39:19Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Commonality-Parsing Network across Shape and Appearance for Partially
Supervised Instance Segmentation [71.59275788106622]
We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
Our model significantly outperforms the state-of-the-art methods on both partially supervised setting and few-shot setting for instance segmentation on COCO dataset.
arXiv Detail & Related papers (2020-07-24T07:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.