Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels
with overclustering
- URL: http://arxiv.org/abs/2012.01768v1
- Date: Thu, 3 Dec 2020 08:54:25 GMT
- Title: Beyond Cats and Dogs: Semi-supervised Classification of fuzzy labels
with overclustering
- Authors: Lars Schmarje and Johannes Br\"unger and Monty Santarossa and
Simon-Martin Schr\"oder and Rainer Kiko and Reinhard Koch
- Abstract summary: We propose a novel framework for handling semi-supervised classifications of fuzzy labels.
Our framework is based on the idea of overclustering to detect substructures in these fuzzy labels.
- Score: 1.6392706389599345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A long-standing issue with deep learning is the need for large and
consistently labeled datasets. Although the current research in semi-supervised
learning can decrease the required amount of annotated data by a factor of 10
or even more, this line of research still uses distinct classes like cats and
dogs. However, in the real-world we often encounter problems where different
experts have different opinions, thus producing fuzzy labels. We propose a
novel framework for handling semi-supervised classifications of such fuzzy
labels. Our framework is based on the idea of overclustering to detect
substructures in these fuzzy labels. We propose a novel loss to improve the
overclustering capability of our framework and show on the common image
classification dataset STL-10 that it is faster and has better overclustering
performance than previous work. On a real-world plankton dataset, we illustrate
the benefit of overclustering for fuzzy labels and show that we beat previous
state-of-the-art semisupervised methods. Moreover, we acquire 5 to 10% more
consistent predictions of substructures.
Related papers
- Active Generalized Category Discovery [60.69060965936214]
Generalized Category Discovery (GCD) endeavors to cluster unlabeled samples from both novel and old classes.
We take the spirit of active learning and propose a new setting called Active Generalized Category Discovery (AGCD)
Our method achieves state-of-the-art performance on both generic and fine-grained datasets.
arXiv Detail & Related papers (2024-03-07T07:12:24Z) - Making Binary Classification from Multiple Unlabeled Datasets Almost
Free of Supervision [128.6645627461981]
We propose a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors.
In MU-OPPO, we do not need the class priors for all unlabeled datasets.
We show that our framework brings smaller estimation errors of class priors and better performance of binary classification.
arXiv Detail & Related papers (2023-06-12T11:33:46Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Spatiotemporal Classification with limited labels using Constrained
Clustering for large datasets [22.117238467818623]
Separable representations can lead to supervised models with better classification capabilities.
We show how we can learn even better representation using a constrained loss with few labels.
We conclude by showing how our method, using few labels, can pick out new labeled samples from the unlabeled data, which can be used to augment supervised methods leading to better classification.
arXiv Detail & Related papers (2022-10-14T05:05:22Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - Learning from Label Proportions by Learning with Label Noise [30.7933303912474]
Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags.
We provide a theoretically grounded approach to LLP based on a reduction to learning with label noise.
Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures.
arXiv Detail & Related papers (2022-03-04T18:52:21Z) - Semi-Supervised Learning with Taxonomic Labels [42.02670649470055]
We propose techniques to incorporate coarse taxonomic labels to train image classifiers in fine-grained domains.
On the Semi-iNat dataset consisting of 810 species across three Kingdoms, incorporating Phylum labels improves the Species level classification accuracy by 6%.
We propose a technique to select relevant data from a large collection of unlabeled images guided by the hierarchy which improves the robustness.
arXiv Detail & Related papers (2021-11-23T00:50:25Z) - Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels
with Overclustering and Inverse Cross-Entropy [1.6392706389599345]
We propose a novel framework for handling semi-supervised classifications of fuzzy labels.
It is based on the idea of overclustering to detect substructures in these fuzzy labels.
We show that our framework is superior to previous state-of-the-art semi-supervised methods when applied to real-world plankton data with fuzzy labels.
arXiv Detail & Related papers (2021-10-13T10:50:50Z) - Highly Efficient Representation and Active Learning Framework for
Imbalanced Data and its Application to COVID-19 X-Ray Classification [0.7829352305480284]
We propose a highly data-efficient classification and active learning framework for classifying chest X-rays.
It is based on (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process method.
We demonstrate that only $sim 10%$ of the labeled data is needed to reach the accuracy from training all available labels.
arXiv Detail & Related papers (2021-02-25T02:48:59Z) - Are Fewer Labels Possible for Few-shot Learning? [81.89996465197392]
Few-shot learning is challenging due to its very limited data and labels.
Recent studies in big transfer (BiT) show that few-shot learning can greatly benefit from pretraining on large scale labeled dataset in a different domain.
We propose eigen-finetuning to enable fewer shot learning by leveraging the co-evolution of clustering and eigen-samples in the finetuning.
arXiv Detail & Related papers (2020-12-10T18:59:29Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.