Related papers: Efficient Failure Pattern Identification of Predictive Algorithms

Efficient Failure Pattern Identification of Predictive Algorithms

URL: http://arxiv.org/abs/2306.00760v1
Date: Thu, 1 Jun 2023 14:54:42 GMT
Title: Efficient Failure Pattern Identification of Predictive Algorithms
Authors: Bao Nguyen, Viet Anh Nguyen
Abstract summary: We propose a human-machine collaborative framework that consists of a team of human annotators and a sequential recommendation algorithm. The results empirically demonstrate the competitive performance of our framework on multiple datasets at various signal-to-noise ratios.
Score: 15.02620042972929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Given a (machine learning) classifier and a collection of unlabeled data, how can we efficiently identify misclassification patterns presented in this dataset? To address this problem, we propose a human-machine collaborative framework that consists of a team of human annotators and a sequential recommendation algorithm. The recommendation algorithm is conceptualized as a stochastic sampler that, in each round, queries the annotators a subset of samples for their true labels and obtains the feedback information on whether the samples are misclassified. The sampling mechanism needs to balance between discovering new patterns of misclassification (exploration) and confirming the potential patterns of classification (exploitation). We construct a determinantal point process, whose intensity balances the exploration-exploitation trade-off through the weighted update of the posterior at each round to form the generator of the stochastic sampler. The numerical results empirically demonstrate the competitive performance of our framework on multiple datasets at various signal-to-noise ratios.

Related papers

Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification [0.0]
We propose a novel learning framework that can generate synthetic data instances in a data-driven manner. The proposed framework formulates the oversampling process as a composition of discrete decision criteria. Experiments on the imbalanced classification task demonstrate the superiority of our framework over state-of-the-art algorithms.
arXiv Detail & Related papers (2025-02-08T13:35:00Z)
Pairwise Similarity Distribution Clustering for Noisy Label Learning [0.0]
Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels. We propose a simple yet effective sample selection algorithm to divide the training samples into one clean set and another noisy set. Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T11:30:22Z)
Sampling Audit Evidence Using a Naive Bayes Classifier [0.0]
This study advances sampling techniques by integrating machine learning with sampling. Machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples.
arXiv Detail & Related papers (2024-03-21T01:35:03Z)
Generator Born from Classifier [66.56001246096002]
We aim to reconstruct an image generator, without relying on any data samples. We propose a novel learning paradigm, in which the generator is trained to ensure that the convergence conditions of the network parameters are satisfied.
arXiv Detail & Related papers (2023-12-05T03:41:17Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations. Recent advances accomplish this task by leveraging clustering-based pseudo labels. We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z)
Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z)
Assessing the Quality of the Datasets by Identifying Mislabeled Samples [14.881597737762316]
We propose a novel statistic -- noise score -- as a measure for the quality of each data point to identify mislabeled samples. In our work, we use the representations derived by the inference network of data quality supervised variational autoencoder (AQUAVS) We validate our proposed statistic through experimentation by corrupting MNIST, FashionMNIST, and CIFAR10/100 datasets.
arXiv Detail & Related papers (2021-09-10T17:14:09Z)
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers. Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores. While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z)
Pattern Sampling for Shapelet-based Time Series Classification [4.94950858749529]
Subsequence-based time series classification algorithms provide accurate and interpretable models. These algorithms are based on exhaustive search for highly discriminative subsequences. Pattern sampling has been proposed as an effective alternative to mitigate the pattern explosion phenomenon.
arXiv Detail & Related papers (2021-02-16T23:35:10Z)
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.