Related papers: FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

URL: http://arxiv.org/abs/2504.10487v2
Date: Wed, 30 Jul 2025 14:39:53 GMT
Title: FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Authors: Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Raoul de Charette,
Abstract summary: This paper challenges the conventional practice in Open-Vocabulary Semantic (OVSS) of using averaged class-wise text embeddings.<n>We introduce a novel approach that estimates class-experts without any labeled data or training.<n>By leveraging the class-wise prediction entropy of single-template classifiers, we select those yielding the lowest entropy as the most reliable class-experts.
Score: 25.106772176792653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we challenge the conventional practice in Open-Vocabulary Semantic Segmentation (OVSS) of using averaged class-wise text embeddings, which are typically obtained by encoding each class name with multiple templates (e.g., a photo of <class>, a sketch of a <class>). We investigate the impact of templates for OVSS, and find that for each class, there exist single-template classifiers--which we refer to as class-experts--that significantly outperform the conventional averaged classifier. First, to identify these class-experts, we introduce a novel approach that estimates them without any labeled data or training. By leveraging the class-wise prediction entropy of single-template classifiers, we select those yielding the lowest entropy as the most reliable class-experts. Second, we combine the outputs of class-experts in a new fusion process. Our plug-and-play method, coined FLOSS, is orthogonal and complementary to existing OVSS methods, offering an improvement without the need for additional labels or training. Extensive experiments show that FLOSS consistently enhances state-of-the-art OVSS models, generalizes well across datasets with different distribution shifts, and delivers substantial improvements in low-data scenarios where only a few unlabeled images are available. Our code is available at https://github.com/yasserben/FLOSS .

Related papers

An Adaptor for Triggering Semi-Supervised Learning to Out-of-Box Serve Deep Image Clustering [52.15903983661315]
ASD is an adaptor that enables the cold-start of SSL learners for deep image clustering without any prerequisites.<n>We show the superior performance of ASD across various benchmarks against the latest deep image clustering approaches.
arXiv Detail & Related papers (2025-09-25T10:21:01Z)
Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation [34.10745316463388]
We propose a novel TTA method tailored to adapting VLMs for segmentation during test time.<n>Our approach could be used as plug-and-play for any segmentation network, does not require additional training data or labels, and remains effective even with a single test sample.<n>We introduce a comprehensive OVSS TTA benchmark suite, which integrates a rigorous evaluation protocol, nine segmentation datasets, 15 common synthetic corruptions, and additional real and rendered domain shifts.
arXiv Detail & Related papers (2025-05-28T00:24:47Z)
Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference. We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem. Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z)
Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks [10.556477506959888]
Existing methods often encounter difficulties in drawing accurate class prototypes from support set samples. Recent approaches attempt to incorporate external knowledge or pre-trained language models to augment data, but this requires additional resources. We propose a novel solution by adequately leveraging the information within the task itself.
arXiv Detail & Related papers (2024-10-14T12:47:11Z)
Recurrent Early Exits for Federated Learning with Heterogeneous Clients [22.429334632124817]
Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities. We propose a recurrent early exit approach named ReeFL that fuses features from different sub-models into a single shared classifier.
arXiv Detail & Related papers (2024-05-23T17:01:53Z)
Training-Free Semantic Segmentation via LLM-Supervision [37.9007813884699]
This paper introduces a new approach to text-supervised semantic segmentation using supervision by a large language model (LLM) Our method starts from an LLM to generate a detailed set of subclasses for more accurate class representation. We then employ an advanced text-supervised semantic segmentation model to apply the generated subclasses as target labels.
arXiv Detail & Related papers (2024-03-31T14:37:25Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification. In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary. We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z)
Evolving Multi-Label Fuzzy Classifier [5.53329677986653]
Multi-label classification has attracted much attention in the machine learning community to address the problem of assigning single samples to more than one class at the same time. We propose an evolving multi-label fuzzy classifier (EFC-ML) which is able to self-adapt and self-evolve its structure with new incoming multi-label samples in an incremental, single-pass manner.
arXiv Detail & Related papers (2022-03-29T08:01:03Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
ECKPN: Explicit Class Knowledge Propagation Network for Transductive Few-shot Learning [53.09923823663554]
Class-level knowledge can be easily learned by humans from just a handful of samples. We propose an Explicit Class Knowledge Propagation Network (ECKPN) to address this problem. We conduct extensive experiments on four few-shot classification benchmarks, and the experimental results show that the proposed ECKPN significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-16T02:29:43Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Semi-Supervised Domain Generalization with Stochastic StyleMatch [90.98288822165482]
In real-world applications, we might have only a few labels available from each source domain due to high annotation cost. In this work, we investigate semi-supervised domain generalization, a more realistic and practical setting. Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling.
arXiv Detail & Related papers (2021-06-01T16:00:08Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.