Towards Realistic Long-tailed Semi-supervised Learning in an Open World
- URL: http://arxiv.org/abs/2405.14516v1
- Date: Thu, 23 May 2024 12:53:50 GMT
- Title: Towards Realistic Long-tailed Semi-supervised Learning in an Open World
- Authors: Yuanpeng He, Lijian Li,
- Abstract summary: We construct a more emphRealistic Open-world Long-tailed Semi-supervised Learning (textbfROLSSL) setting where there is no premise on the distribution relationships between known and novel categories.
Under the proposed ROLSSL setting, we propose a simple yet potentially effective solution called dual-stage logit adjustments.
Experiments on datasets such as CIFAR100 and ImageNet100 have demonstrated performance improvements of up to 50.1%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-world long-tailed semi-supervised learning (OLSSL) has increasingly attracted attention. However, existing OLSSL algorithms generally assume that the distributions between known and novel categories are nearly identical. Against this backdrop, we construct a more \emph{Realistic Open-world Long-tailed Semi-supervised Learning} (\textbf{ROLSSL}) setting where there is no premise on the distribution relationships between known and novel categories. Furthermore, even within the known categories, the number of labeled samples is significantly smaller than that of the unlabeled samples, as acquiring valid annotations is often prohibitively costly in the real world. Under the proposed ROLSSL setting, we propose a simple yet potentially effective solution called dual-stage post-hoc logit adjustments. The proposed approach revisits the logit adjustment strategy by considering the relationships among the frequency of samples, the total number of categories, and the overall size of data. Then, it estimates the distribution of unlabeled data for both known and novel categories to dynamically readjust the corresponding predictive probabilities, effectively mitigating category bias during the learning of known and novel classes with more selective utilization of imbalanced unlabeled data. Extensive experiments on datasets such as CIFAR100 and ImageNet100 have demonstrated performance improvements of up to 50.1\%, validating the superiority of our proposed method and establishing a strong baseline for this task. For further researches, the anonymous link to the experimental code is at \href{https://github.com/heyuanpengpku/ROLSSL}{\textcolor{brightpink}{https://github.com/heyuanpengpku/ROLSSL}}
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - RankMatch: A Novel Approach to Semi-Supervised Label Distribution
Learning Leveraging Inter-label Correlations [52.549807652527306]
This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL)
RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data.
We establish a theoretical generalization bound for RankMatch, and through extensive experiments, demonstrate its superiority in performance against existing SSLDL methods.
arXiv Detail & Related papers (2023-12-11T12:47:29Z) - Semi-Supervised Learning in the Few-Shot Zero-Shot Scenario [14.916971861796384]
Semi-Supervised Learning (SSL) is a framework that utilizes both labeled and unlabeled data to enhance model performance.
We propose a general approach to augment existing SSL methods, enabling them to handle situations where certain classes are missing.
Our experimental results reveal significant improvements in accuracy when compared to state-of-the-art SSL, open-set SSL, and open-world SSL methods.
arXiv Detail & Related papers (2023-08-27T14:25:07Z) - Fusion of Global and Local Knowledge for Personalized Federated Learning [75.20751492913892]
In this paper, we explore personalized models with low-rank and sparse decomposition.
We propose a two-stage-based algorithm named textbfFederated learning with mixed textbfSparse and textbfRank representation.
Under proper assumptions, we show that the GKR trained by FedSLR can at least sub-linearly converge to a stationary point of the regularized problem.
arXiv Detail & Related papers (2023-02-21T23:09:45Z) - Towards Realistic Semi-Supervised Learning [73.59557447798134]
We propose a novel approach to tackle SSL in open-world setting, where we simultaneously learn to classify known and unknown classes.
Our approach substantially outperforms the existing state-of-the-art on seven diverse datasets.
arXiv Detail & Related papers (2022-07-05T19:04:43Z) - OpenLDN: Learning to Discover Novel Classes for Open-World
Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning.
Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data.
This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - NeuCrowd: Neural Sampling Network for Representation Learning with
Crowdsourced Labels [19.345894148534335]
We propose emphNeuCrowd, a unified framework for supervised representation learning (SRL) from crowdsourced labels.
The proposed framework is evaluated on both one synthetic and three real-world data sets.
arXiv Detail & Related papers (2020-03-21T13:38:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.