Related papers: Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation

Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation

URL: http://arxiv.org/abs/2406.19638v1
Date: Fri, 28 Jun 2024 03:58:02 GMT
Title: Precision matters: Precision-aware ensemble for weakly supervised semantic segmentation
Authors: Junsung Park, Hyunjung Shim,
Abstract summary: Weakly Supervised Semantic (WSSS) employs weak supervision, such as image-level labels, to train the segmentation model. We propose ORANDNet, an advanced ensemble approach tailored for WSSS.
Score: 14.931551206723041
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Weakly Supervised Semantic Segmentation (WSSS) employs weak supervision, such as image-level labels, to train the segmentation model. Despite the impressive achievement in recent WSSS methods, we identify that introducing weak labels with high mean Intersection of Union (mIoU) does not guarantee high segmentation performance. Existing studies have emphasized the importance of prioritizing precision and reducing noise to improve overall performance. In the same vein, we propose ORANDNet, an advanced ensemble approach tailored for WSSS. ORANDNet combines Class Activation Maps (CAMs) from two different classifiers to increase the precision of pseudo-masks (PMs). To further mitigate small noise in the PMs, we incorporate curriculum learning. This involves training the segmentation model initially with pairs of smaller-sized images and corresponding PMs, gradually transitioning to the original-sized pairs. By combining the original CAMs of ResNet-50 and ViT, we significantly improve the segmentation performance over the single-best model and the naive ensemble model, respectively. We further extend our ensemble method to CAMs from AMN (ResNet-like) and MCTformer (ViT-like) models, achieving performance benefits in advanced WSSS models. It highlights the potential of our ORANDNet as a final add-on module for WSSS models.

Related papers

SAMPO: Visual Preference Optimization for Intent-Aware Segmentation with Vision Foundation Models [5.3279948735247284]
We introduce SAMPO, a novel framework that teaches visual foundation models to infer high-level categorical intent from sparse visual interactions.<n>Our work establishes a new paradigm for intent-aware alignment in visual foundation models, removing dependencies on auxiliary prompt generators or language-model-assisted preference learning.
arXiv Detail & Related papers (2025-08-04T14:31:11Z)
GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity. However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks. Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z)
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations [16.885965702357314]
MIM-Refiner is a contrastive learning boost for pre-trained MIM models. We refine the features of MIM models from subpar to state-of-the-art, off-the-shelf features.
arXiv Detail & Related papers (2024-02-15T16:46:16Z)
MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation [17.59676962334776]
Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation. Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively. We present a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart.
arXiv Detail & Related papers (2024-01-22T07:31:52Z)
Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments. It is hard to distinguish concealed objects from the background due to the intrinsic similarity. We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z)
GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector [156.43671738038657]
We present a novel end-to-end group collaborative learning network, termed GCoNet+. GCoNet+ can effectively and efficiently identify co-salient objects in natural scenes.
arXiv Detail & Related papers (2022-05-30T23:49:19Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
Reviving Iterative Training with Mask Guidance for Interactive Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes. We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps. We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.