Making Binary Classification from Multiple Unlabeled Datasets Almost
Free of Supervision
- URL: http://arxiv.org/abs/2306.07036v1
- Date: Mon, 12 Jun 2023 11:33:46 GMT
- Title: Making Binary Classification from Multiple Unlabeled Datasets Almost
Free of Supervision
- Authors: Yuhao Wu, Xiaobo Xia, Jun Yu, Bo Han, Gang Niu, Masashi Sugiyama,
Tongliang Liu
- Abstract summary: We propose a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors.
In MU-OPPO, we do not need the class priors for all unlabeled datasets.
We show that our framework brings smaller estimation errors of class priors and better performance of binary classification.
- Score: 128.6645627461981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a classifier exploiting a huge amount of supervised data is
expensive or even prohibited in a situation, where the labeling cost is high.
The remarkable progress in working with weaker forms of supervision is binary
classification from multiple unlabeled datasets which requires the knowledge of
exact class priors for all unlabeled datasets. However, the availability of
class priors is restrictive in many real-world scenarios. To address this
issue, we propose to solve a new problem setting, i.e., binary classification
from multiple unlabeled datasets with only one pairwise numerical relationship
of class priors (MU-OPPO), which knows the relative order (which unlabeled
dataset has a higher proportion of positive examples) of two class-prior
probabilities for two datasets among multiple unlabeled datasets. In MU-OPPO,
we do not need the class priors for all unlabeled datasets, but we only require
that there exists a pair of unlabeled datasets for which we know which
unlabeled dataset has a larger class prior. Clearly, this form of supervision
is easier to be obtained, which can make labeling costs almost free. We propose
a novel framework to handle the MU-OPPO problem, which consists of four
sequential modules: (i) pseudo label assignment; (ii) confident example
collection; (iii) class prior estimation; (iv) classifier training with
estimated class priors. Theoretically, we analyze the gap between estimated
class priors and true class priors under the proposed framework. Empirically,
we confirm the superiority of our framework with comprehensive experiments.
Experimental results demonstrate that our framework brings smaller estimation
errors of class priors and better performance of binary classification.
Related papers
- Active Generalized Category Discovery [60.69060965936214]
Generalized Category Discovery (GCD) endeavors to cluster unlabeled samples from both novel and old classes.
We take the spirit of active learning and propose a new setting called Active Generalized Category Discovery (AGCD)
Our method achieves state-of-the-art performance on both generic and fine-grained datasets.
arXiv Detail & Related papers (2024-03-07T07:12:24Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - Learning from Multiple Unlabeled Datasets with Partial Risk
Regularization [80.54710259664698]
In this paper, we aim to learn an accurate classifier without any class labels.
We first derive an unbiased estimator of the classification risk that can be estimated from the given unlabeled sets.
We then find that the classifier obtained as such tends to cause overfitting as its empirical risks go negative during training.
Experiments demonstrate that our method effectively mitigates overfitting and outperforms state-of-the-art methods for learning from multiple unlabeled sets.
arXiv Detail & Related papers (2022-07-04T16:22:44Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - Multilabel Classification by Hierarchical Partitioning and
Data-dependent Grouping [33.48217977134427]
We exploit the sparsity of label vectors and the hierarchical structure to embed them in low-dimensional space.
We present a novel data-dependent grouping approach, where we use a group construction based on a low-rank Nonnegative Matrix Factorization.
We then present a hierarchical partitioning approach that exploits the label hierarchy in large scale problems to divide up the large label space and create smaller sub-problems.
arXiv Detail & Related papers (2020-06-24T22:23:39Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.