Related papers: Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

URL: http://arxiv.org/abs/2510.22229v1
Date: Sat, 25 Oct 2025 09:25:01 GMT
Title: Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation
Authors: Jeongin Kim, Wonho Bae, YouLee Han, Giyeong Oh, Youngjae Yu, Danica J. Sutherland, Junhyug Noh,
Abstract summary: This paper proposes a novel two-stage selection pipeline for semantic segmentation.<n>We achieve high segmentation accuracy with only a tiny fraction of labeled pixels.<n>Our method significantly outperforms existing baselines under extreme pixel-budget regimes.
Score: 33.970333069082294
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation demands dense pixel-level annotations, which can be prohibitively expensive - especially under extremely constrained labeling budgets. In this paper, we address the problem of low-budget active learning for semantic segmentation by proposing a novel two-stage selection pipeline. Our approach leverages a pre-trained diffusion model to extract rich multi-scale features that capture both global structure and fine details. In the first stage, we perform a hierarchical, representation-based candidate selection by first choosing a small subset of representative pixels per image using MaxHerding, and then refining these into a diverse global pool. In the second stage, we compute an entropy-augmented disagreement score (eDALD) over noisy multi-scale diffusion features to capture both epistemic uncertainty and prediction confidence, selecting the most informative pixels for annotation. This decoupling of diversity and uncertainty lets us achieve high segmentation accuracy with only a tiny fraction of labeled pixels. Extensive experiments on four benchmarks (CamVid, ADE-Bed, Cityscapes, and Pascal-Context) demonstrate that our method significantly outperforms existing baselines under extreme pixel-budget regimes. Our code is available at https://github.com/jn-kim/two-stage-edald.

Related papers

Superpixel-Based Image Segmentation Using Squared 2-Wasserstein Distances [11.580076885777151]
We present an efficient method for image segmentation in the presence of strong inhomogeneities.<n>Superpixels are first grouped into superpixels via a linear least-squares assignment problem.<n>These superpixels are then greedily merged into object-level segments using the squared 2-Wasserstein distance between their empirical distributions.
arXiv Detail & Related papers (2026-01-22T22:24:15Z)
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.<n>We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.<n>Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z)
Towards Effective Image Manipulation Detection with Proposal Contrastive Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection. Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively. Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z)
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding [24.787497472368244]
We propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals. Our method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.
arXiv Detail & Related papers (2022-08-11T05:42:12Z)
SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation [52.62441404064957]
Domain adaptive semantic segmentation attempts to make satisfactory dense predictions on an unlabeled target domain by utilizing the model trained on a labeled source domain. Many methods tend to alleviate noisy pseudo labels, however, they ignore intrinsic connections among cross-domain pixels with similar semantic concepts. We propose Semantic-Guided Pixel Contrast (SePiCo), a novel one-stage adaptation framework that highlights the semantic concepts of individual pixel.
arXiv Detail & Related papers (2022-04-19T11:16:29Z)
AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance. We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations. AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
Superpixel-guided Discriminative Low-rank Representation of Hyperspectral Images for Classification [49.32130776974202]
SP-DLRR is composed of two modules, i.e., the classification-guided superpixel segmentation and the discriminative low-rank representation. Experimental results on three benchmark datasets demonstrate the significant superiority of SP-DLRR over state-of-the-art methods.
arXiv Detail & Related papers (2021-08-25T10:47:26Z)
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images. A preferred high-level representation should capture the contextual information while not losing self-awareness. We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z)
All you need are a few pixels: semantic segmentation with PixelPick [30.234492042103966]
In this work, we show that in order to achieve a good level of segmentation performance, all you need are a few well-chosen pixel labels. We demonstrate how to exploit this phenomena within an active learning framework, termed PixelPick, to radically reduce labelling cost.
arXiv Detail & Related papers (2021-04-13T17:55:33Z)
Reinforced active learning for image segmentation [34.096237671643145]
We present a new active learning strategy for semantic segmentation based on deep reinforcement learning (RL) An agent learns a policy to select a subset of small informative image regions -- opposed to entire images -- to be labeled from a pool of unlabeled data. Our method proposes a new modification of the deep Q-network (DQN) formulation for active learning, adapting it to the large-scale nature of semantic segmentation problems.
arXiv Detail & Related papers (2020-02-16T14:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.