False membership rate control in mixture models
- URL: http://arxiv.org/abs/2203.02597v4
- Date: Wed, 25 Oct 2023 14:04:25 GMT
- Title: False membership rate control in mixture models
- Authors: Ariane Marandon, Tabea Rebafka, Etienne Roquain, Nataliya Sokolovska
- Abstract summary: A clustering task consists in partitioning elements of a sample into homogeneous groups.
In the supervised setting, this approach is well known and referred to as classification with an abstention option.
In this paper the approach is revisited in an unsupervised mixture model framework and the purpose is to develop a method that comes with the guarantee that the false membership rate does not exceed a pre-defined nominal level.
- Score: 1.387448620257867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The clustering task consists in partitioning elements of a sample into
homogeneous groups. Most datasets contain individuals that are ambiguous and
intrinsically difficult to attribute to one or another cluster. However, in
practical applications, misclassifying individuals is potentially disastrous
and should be avoided. To keep the misclassification rate small, one can decide
to classify only a part of the sample. In the supervised setting, this approach
is well known and referred to as classification with an abstention option. In
this paper the approach is revisited in an unsupervised mixture model framework
and the purpose is to develop a method that comes with the guarantee that the
false membership rate (FMR) does not exceed a pre-defined nominal level
$\alpha$. A plug-in procedure is proposed, for which a theoretical analysis is
provided, by quantifying the FMR deviation with respect to the target level
$\alpha$ with explicit remainder terms. Bootstrap versions of the procedure are
shown to improve the performance in numerical experiments.
Related papers
- Adaptive Margin Global Classifier for Exemplar-Free Class-Incremental Learning [3.4069627091757178]
Existing methods mainly focus on handling biased learning.
We introduce a Distribution-Based Global (DBGC) to avoid bias factors in existing methods, such as data imbalance and sampling.
More importantly, the compromised distributions of old classes are simulated via a simple operation, variance (VE).
This loss is proven equivalent to an Adaptive Margin Softmax Cross Entropy (AMarX)
arXiv Detail & Related papers (2024-09-20T07:07:23Z) - Robust Non-adaptive Group Testing under Errors in Group Membership Specifications [3.554868356768806]
Group testing (GT) aims to determine defect status by performing tests on $n p$ groups', where a group is formed by mixing a subset of the $p$ samples.
Most existing methods, however, assume that the group memberships are accurately specified.
We develop a new GT method, the Debiased Robust Lasso Test Method (DRLT), that handles such group membership specification errors.
arXiv Detail & Related papers (2024-09-09T06:03:23Z) - A Universal Unbiased Method for Classification from Aggregate
Observations [115.20235020903992]
This paper presents a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses.
Our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses.
arXiv Detail & Related papers (2023-06-20T07:22:01Z) - Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised
Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations.
Recent advances accomplish this task by leveraging clustering-based pseudo labels.
We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Self-Adaptive Label Augmentation for Semi-supervised Few-shot
Classification [121.63992191386502]
Few-shot classification aims to learn a model that can generalize well to new tasks when only a few labeled samples are available.
We propose a semi-supervised few-shot classification method that assigns an appropriate label to each unlabeled sample by a manually defined metric.
A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-16T13:14:03Z) - Hybrid Dynamic Contrast and Probability Distillation for Unsupervised
Person Re-Id [109.1730454118532]
Unsupervised person re-identification (Re-Id) has attracted increasing attention due to its practical application in the read-world video surveillance system.
We present the hybrid dynamic cluster contrast and probability distillation algorithm.
It formulates the unsupervised Re-Id problem into an unified local-to-global dynamic contrastive learning and self-supervised probability distillation framework.
arXiv Detail & Related papers (2021-09-29T02:56:45Z) - Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets.
Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach.
The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.