Related papers: DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation

DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation

URL: http://arxiv.org/abs/2507.10118v1
Date: Mon, 14 Jul 2025 10:01:02 GMT
Title: DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation
Authors: Ivan Martinović, Josip Šarić, Marin Oršić, Matej Kristan, Siniša Šegvić,
Abstract summary: We develop a novel semi-supervised panoptic approach fueled by two dedicated foundation models.<n>We enhance recognition by complementing mask-transformer consistency with zero-shot classification of CLIP features.<n>We observe 29.9 PQ and 38.9 mIoU on ADE20K with only 158 labeled images.
Score: 7.374034913971139
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Pixel-level annotation is expensive and time-consuming. Semi-supervised segmentation methods address this challenge by learning models on few labeled images alongside a large corpus of unlabeled images. Although foundation models could further account for label scarcity, effective mechanisms for their exploitation remain underexplored. We address this by devising a novel semi-supervised panoptic approach fueled by two dedicated foundation models. We enhance recognition by complementing unsupervised mask-transformer consistency with zero-shot classification of CLIP features. We enhance localization by class-agnostic decoder warm-up with respect to SAM pseudo-labels. The resulting decoupled enhancement of recognition and localization (DEARLi) particularly excels in the most challenging semi-supervised scenarios with large taxonomies and limited labeled data. Moreover, DEARLi outperforms the state of the art in semi-supervised semantic segmentation by a large margin while requiring 8x less GPU memory, in spite of being trained only for the panoptic objective. We observe 29.9 PQ and 38.9 mIoU on ADE20K with only 158 labeled images. The source code is available at https://github.com/helen1c/DEARLi.

Related papers

PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning [3.801446153948012]
We introduce Precision-Enhanced Pseudo-Labeling(PEPL) approach for fine-grained image classification within a semi-supervised learning framework. Our method leverages the abundance of unlabeled data by generating high-quality pseudo-labels. We achieve state-of-the-art performance on benchmark datasets, demonstrating significant improvements over existing semi-supervised strategies.
arXiv Detail & Related papers (2024-09-05T02:32:07Z)
Progressive Feature Self-reinforcement for Weakly Supervised Semantic Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels. We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing. Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z)
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization [31.333862320143968]
Semi-supervised semantic segmentation aims to utilize limited labeled images and abundant unlabeled images to achieve label-efficient learning. We propose a novel framework called textttMaskMatch, which enables fine-grained locality learning to achieve better dense segmentation.
arXiv Detail & Related papers (2023-12-14T03:28:53Z)
Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z)
Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation [30.812323329239614]
Weakly supervised semantic segmentation (WSSS) aims to bypass the need for laborious pixel-level annotation by using only image-level annotation. Most existing methods rely on Class Activation Maps (CAM) to derive pixel-level pseudo-labels. We introduce a simple yet effective method harnessing the Segment Anything Model (SAM), a class-agnostic foundation model capable of producing fine-grained instance masks of objects, parts, and subparts.
arXiv Detail & Related papers (2023-05-09T23:24:09Z)
Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation [75.32213865436442]
We propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model to alleviate the noisy label and multi-class generalization issues. The MDBA model can reach the mIoU of 69.5% and 70.2% on validation and test sets for the PASCAL VOC 2012 dataset.
arXiv Detail & Related papers (2023-05-09T03:33:43Z)
High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss. We reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z)
Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions. We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z)
Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation [72.33139350241044]
Weakly Supervised Semantic COCO (WSSS) based on image-level labels has attracted much attention due to low annotation costs. We propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss. Our SIPE achieves new state-of-the-art performance using only image-level labels.
arXiv Detail & Related papers (2022-03-06T09:01:03Z)
Towards Single Stage Weakly Supervised Semantic Segmentation [2.28438857884398]
We present a single-stage approach to weakly supervised semantic segmentation. We use point annotations to generate reliable, on-the-fly pseudo-masks. We significantly outperform other SOTA WSSS methods on recent real-world datasets.
arXiv Detail & Related papers (2021-06-18T18:34:50Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.