Related papers: Why is the prediction wrong? Towards underfitting case explanation via meta-classification

Why is the prediction wrong? Towards underfitting case explanation via meta-classification

URL: http://arxiv.org/abs/2302.09952v1
Date: Mon, 20 Feb 2023 12:40:54 GMT
Title: Why is the prediction wrong? Towards underfitting case explanation via meta-classification
Authors: Sheng Zhou (CEDRIC - VERTIGO, CNAM, LADIS), Pierre Blanchart (LADIS), Michel Crucianu (CEDRIC - VERTIGO, CNAM), Marin Ferecatu (CEDRIC - VERTIGO, CNAM)
Abstract summary: We project faulty data into a hand-crafted, intermediate representation (meta-representation, profile vectors) We present a method to fit a meta-classifier (decision tree) and express its output as a set of interpretable (human readable) explanation rules. Experimental results on several real datasets show more than 80% diagnosis label accuracy.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we present a heuristic method to provide individual explanations for those elements in a dataset (data points) which are wrongly predicted by a given classifier. Since the general case is too difficult, in the present work we focus on faulty data from an underfitted model. First, we project the faulty data into a hand-crafted, and thus human readable, intermediate representation (meta-representation, profile vectors), with the aim of separating the two main causes of miss-classification: the classifier is not strong enough, or the data point belongs to an area of the input space where classes are not separable. Second, in the space of these profile vectors, we present a method to fit a meta-classifier (decision tree) and express its output as a set of interpretable (human readable) explanation rules, which leads to several target diagnosis labels: data point is either correctly classified, or faulty due to a too weak model, or faulty due to mixed (overlapped) classes in the input space. Experimental results on several real datasets show more than 80% diagnosis label accuracy and confirm that the proposed intermediate representation allows to achieve a high degree of invariance with respect to the classifier used in the input space and to the dataset being classified, i.e. we can learn the metaclassifier on a dataset with a given classifier and successfully predict diagnosis labels for a different dataset or classifier (or both).

Related papers

A method for classification of data with uncertainty using hypothesis testing [0.0]
It is necessary to quantify uncertainty and adopt decision-making approaches that take it into account. We propose a new decision-making approach using two types of hypothesis testing. This method is capable of detecting ambiguous data that belong to the overlapping regions of two class distributions.
arXiv Detail & Related papers (2025-02-12T17:14:07Z)
Evaluating multiple models using labeled and unlabeled data [8.174722982389259]
Semi-Supervised Model Evaluation (SSME) is a method that uses both labeled and unlabeled data to evaluate machine learning classifiers. We present experiments in four domains where obtaining large labeled datasets is often impractical: (1) healthcare, (2) content moderation, (3) molecular property prediction, and (4) image annotation. Our results demonstrate that SSME estimates performance more accurately than do competing methods, reducing error by 5.1x relative to using labeled data alone and 2.4x relative to the next best competing method.
arXiv Detail & Related papers (2025-01-21T03:47:37Z)
Balancing Fairness and Accuracy in Data-Restricted Binary Classification [14.439413517433891]
This paper proposes a framework that models the trade-off between accuracy and fairness under four practical scenarios. Experiments on three datasets demonstrate the utility of the proposed framework as a tool for quantifying the trade-offs.
arXiv Detail & Related papers (2024-03-12T15:01:27Z)
XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification. XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type. We build an FG-TED model to predict the textbf addition and textbfomission errors. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)
Classification at the Accuracy Limit -- Facing the Problem of Data Ambiguity [0.0]
We show the theoretical limit for classification accuracy that arises from the overlap of data categories. We compare emerging data embeddings produced by supervised and unsupervised training, using MNIST and human EEG recordings during sleep. This suggests that human-defined categories, such as hand-written digits or sleep stages, can indeed be considered as 'natural kinds'
arXiv Detail & Related papers (2022-06-04T07:00:32Z)
Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance. We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes. Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z)
Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare. In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples. To tackle this problem, we build a robust one-class classification framework via data refinement. We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information [25.739240011015923]
We show that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model. Our analysis has surprising and counter-intuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible.
arXiv Detail & Related papers (2021-02-16T19:02:55Z)
Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z)
Class maps for visualizing classification results [0.0]
A classification method first processes a training set of objects with given classes (labels) When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label. The proposed class map reflects the probability that an object belongs to an alternative class, how far it is from the other objects in its given class, and whether some objects lie far from all classes.
arXiv Detail & Related papers (2020-07-28T21:27:15Z)
Dynamic Decision Boundary for One-class Classifiers applied to non-uniformly Sampled Data [0.9569316316728905]
A typical issue in Pattern Recognition is the non-uniformly sampled data. In this paper, we propose a one-class classifier based on the minimum spanning tree with a dynamic decision boundary.
arXiv Detail & Related papers (2020-04-05T18:29:36Z)
Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.