A Survey on Training-free Open-Vocabulary Semantic Segmentation
- URL: http://arxiv.org/abs/2505.22209v1
- Date: Wed, 28 May 2025 10:37:52 GMT
- Title: A Survey on Training-free Open-Vocabulary Semantic Segmentation
- Authors: Naomi Kombol, Ivan Martinović, Siniša Šegvić,
- Abstract summary: This survey will cover the history, nuance, idea development and the state-of-the-art in training-free open-vocabulary semantic segmentation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic segmentation is one of the most fundamental tasks in image understanding with a long history of research, and subsequently a myriad of different approaches. Traditional methods strive to train models up from scratch, requiring vast amounts of computational resources and training data. In the advent of moving to open-vocabulary semantic segmentation, which asks models to classify beyond learned categories, large quantities of finely annotated data would be prohibitively expensive. Researchers have instead turned to training-free methods where they leverage existing models made for tasks where data is more easily acquired. Specifically, this survey will cover the history, nuance, idea development and the state-of-the-art in training-free open-vocabulary semantic segmentation that leverages existing multi-modal classification models. We will first give a preliminary on the task definition followed by an overview of popular model archetypes and then spotlight over 30 approaches split into broader research branches: purely CLIP-based, those leveraging auxiliary visual foundation models and ones relying on generative methods. Subsequently, we will discuss the limitations and potential problems of current research, as well as provide some underexplored ideas for future study. We believe this survey will serve as a good onboarding read to new researchers and spark increased interest in the area.
Related papers
- A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances [8.319613769928331]
Coreset selection targets the challenge of finding a small, representative subset of a large dataset that preserves essential patterns for effective machine learning.<n>This survey presents a more comprehensive view by unifying three major lines of coreset research into a single taxonomy.<n>We present subfields often overlooked by existing work, including submodular formulations, bilevel optimization, and recent progress in pseudo-labeling for unlabeled datasets.
arXiv Detail & Related papers (2025-05-23T12:18:34Z) - Learning from Neighbors: Category Extrapolation for Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.<n>We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.<n>To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community.
There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z) - A Comprehensive Review of Few-shot Action Recognition [64.47305887411275]
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data.<n>It requires accurately classifying human actions in videos using only a few labeled examples per class.<n>Numerous approaches have driven significant advancements in few-shot action recognition.
arXiv Detail & Related papers (2024-07-20T03:53:32Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Teaching Smaller Language Models To Generalise To Unseen Compositional
Questions [6.9076450524134145]
We propose a combination of multitask pretraining on up to 93 tasks designed to instill diverse reasoning abilities.
We show that performance can be significantly improved by adding retrieval-augmented training datasets.
arXiv Detail & Related papers (2023-08-02T05:00:12Z) - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future [6.4105103117533755]
A taxonomy is first developed to organize different tasks and methodologies.
The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding.
arXiv Detail & Related papers (2023-07-18T12:52:49Z) - Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges [5.0243930429558885]
Few-Shot Semantic is a novel task in computer vision, which aims at designing models capable of segmenting new semantic classes with only a few examples.
This paper consists of a comprehensive survey of Few-Shot Semantic, tracing its evolution and exploring various model designs.
arXiv Detail & Related papers (2023-04-12T13:07:37Z) - Unsupervised Domain Adaptation for Semantic Image Segmentation: a
Comprehensive Survey [24.622211579286127]
This survey is an effort to summarize five years of this incredibly rapidly growing field.
We present the most important semantic segmentation methods.
We unveil newer trends such as multi-domain learning, domain generalization, test-time adaptation or source-free domain adaptation.
arXiv Detail & Related papers (2021-12-06T18:47:41Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.