Silhouettes and quasi residual plots for neural nets and tree-based
classifiers
- URL: http://arxiv.org/abs/2106.08814v1
- Date: Wed, 16 Jun 2021 14:26:31 GMT
- Title: Silhouettes and quasi residual plots for neural nets and tree-based
classifiers
- Authors: Jakob Raymaekers and Peter J. Rousseeuw
- Abstract summary: Here we pursue a different goal, which is to visualize the cases being classified, either in training data or in test data.
An important aspect is whether a case has been classified to its given class (label) or whether the classifier wants to assign it to different class.
The graphical displays are illustrated and interpreted on benchmark data sets containing images, mixed features, and tweets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classification by neural nets and by tree-based methods are powerful tools of
machine learning. There exist interesting visualizations of the inner workings
of these and other classifiers. Here we pursue a different goal, which is to
visualize the cases being classified, either in training data or in test data.
An important aspect is whether a case has been classified to its given class
(label) or whether the classifier wants to assign it to different class. This
is reflected in the (conditional and posterior) probability of the alternative
class (PAC). A high PAC indicates label bias, i.e. the possibility that the
case was mislabeled. The PAC is used to construct a silhouette plot which is
similar in spirit to the silhouette plot for cluster analysis (Rousseeuw,
1987). The average silhouette width can be used to compare different
classifications of the same dataset. We will also draw quasi residual plots of
the PAC versus a data feature, which may lead to more insight in the data. One
of these data features is how far each case lies from its given class. The
graphical displays are illustrated and interpreted on benchmark data sets
containing images, mixed features, and tweets.
Related papers
- DatasetEquity: Are All Samples Created Equal? In The Quest For Equity
Within Datasets [4.833815605196965]
This paper presents a novel method for addressing data imbalance in machine learning.
It computes sample likelihoods based on image appearance using deep perceptual embeddings and clustering.
It then uses these likelihoods to weigh samples differently during training with a proposed $bfGeneralized Focal Loss$ function.
arXiv Detail & Related papers (2023-08-19T02:11:49Z) - Pretraining Respiratory Sound Representations using Metadata and
Contrastive Learning [1.827510863075184]
Supervised contrastive learning is a paradigm that learns similar representations to samples sharing the same class labels.
We show that it outperforms cross-entropy in classifying respiratory anomalies in two different datasets.
This work suggests the potential of using multiple metadata sources in supervised contrastive settings.
arXiv Detail & Related papers (2022-10-27T12:59:00Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - ClassSPLOM -- A Scatterplot Matrix to Visualize Separation of Multiclass
Multidimensional Data [8.89134799076718]
In multiclass classification of multidimensional data, the user wants to build a model of the classes to predict the label of unseen data.
The model is trained on the data and tested on unseen data with known labels to evaluate its quality.
The results are visualized as a confusion matrix which shows how many data labels have been predicted correctly or confused with other classes.
arXiv Detail & Related papers (2022-01-30T14:09:19Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Class maps for visualizing classification results [0.0]
A classification method first processes a training set of objects with given classes (labels)
When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label.
The proposed class map reflects the probability that an object belongs to an alternative class, how far it is from the other objects in its given class, and whether some objects lie far from all classes.
arXiv Detail & Related papers (2020-07-28T21:27:15Z) - A Bayes-Optimal View on Adversarial Examples [9.51828574518325]
We argue for examining adversarial examples from the perspective of Bayes-optimal classification.
Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier.
arXiv Detail & Related papers (2020-02-20T16:43:47Z) - Multi-Class Classification from Noisy-Similarity-Labeled Data [98.13491369929798]
We propose a method for learning from only noisy-similarity-labeled data.
We use a noise transition matrix to bridge the class-posterior probability between clean and noisy data.
We build a novel learning system which can assign noise-free class labels for instances.
arXiv Detail & Related papers (2020-02-16T05:10:21Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.