Conformalized Semi-supervised Random Forest for Classification and
Abnormality Detection
- URL: http://arxiv.org/abs/2302.02237v2
- Date: Thu, 29 Feb 2024 11:49:45 GMT
- Title: Conformalized Semi-supervised Random Forest for Classification and
Abnormality Detection
- Authors: Yujin Han, Mingwenchan Xu, Leying Guan
- Abstract summary: We introduce the Conformalized Semi-Supervised Random Forest (CSForest)
CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set.
We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Random Forests classifier, a widely utilized off-the-shelf classification
tool, assumes training and test samples come from the same distribution as
other standard classifiers. However, in safety-critical scenarios like medical
diagnosis and network attack detection, discrepancies between the training and
test sets, including the potential presence of novel outlier samples not
appearing during training, can pose significant challenges. To address this
problem, we introduce the Conformalized Semi-Supervised Random Forest
(CSForest), which couples the conformalization technique Jackknife+aB with
semi-supervised tree ensembles to construct a set-valued prediction $C(x)$.
Instead of optimizing over the training distribution, CSForest employs
unlabeled test samples to enhance accuracy and flag unseen outliers by
generating an empty set. Theoretically, we establish CSForest to cover true
labels for previously observed inlier classes under arbitrarily label-shift in
the test data. We compare CSForest with state-of-the-art methods using
synthetic examples and various real-world datasets, under different types of
distribution changes in the test domain. Our results highlight CSForest's
effective prediction of inliers and its ability to detect outlier samples
unique to the test data. In addition, CSForest shows persistently good
performance as the sizes of the training and test sets vary. Codes of CSForest
are available at https://github.com/yujinhan98/CSForest.
Related papers
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue.
We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota)
Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection [48.30283806131551]
We show that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset.
We propose an unsupervised method to reliably identify prototypical samples to further boost UAD performance.
arXiv Detail & Related papers (2023-12-06T15:30:47Z) - DE-CROP: Data-efficient Certified Robustness for Pretrained Classifiers [21.741026088202126]
We propose a novel way to certify the robustness of pretrained models using only a few training samples.
Our proposed approach generates class-boundary and interpolated samples corresponding to each training sample.
We obtain significant improvements over the baseline on multiple benchmark datasets and also report similar performance under the challenging black box setup.
arXiv Detail & Related papers (2022-10-17T10:41:18Z) - Large-Scale Open-Set Classification Protocols for ImageNet [0.0]
Open-Set Classification (OSC) intends to adapt closed-set classification models to real-world scenarios.
We propose three open-set protocols that provide rich datasets of natural images with different levels of similarity between known and unknown classes.
We propose a new validation metric that can be employed to assess whether the training of deep learning models addresses both the classification of known samples and the rejection of unknown samples.
arXiv Detail & Related papers (2022-10-13T07:01:34Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Multi-Class Data Description for Out-of-distribution Detection [25.853322158250435]
Deep-MCDD is effective to detect out-of-distribution (OOD) samples as well as classify in-distribution (ID) samples.
By integrating the concept of Gaussian discriminant analysis into deep neural networks, we propose a deep learning objective to learn class-conditional distributions.
arXiv Detail & Related papers (2021-04-02T08:41:51Z) - Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO)
We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.