A classification performance evaluation measure considering data
separability
- URL: http://arxiv.org/abs/2211.05433v1
- Date: Thu, 10 Nov 2022 09:18:26 GMT
- Title: A classification performance evaluation measure considering data
separability
- Authors: Lingyan Xue, Xinyu Zhang, Weidong Jiang and Kai Huo
- Abstract summary: We propose a new separability measure--the rate of separability (RS)--based on the data coding rate.
We demonstrate the positive correlation between the proposed measure and recognition accuracy in a multi-task scenario constructed from a real dataset.
- Score: 6.751026374812737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning and deep learning classification models are data-driven, and
the model and the data jointly determine their classification performance. It
is biased to evaluate the model's performance only based on the classifier
accuracy while ignoring the data separability. Sometimes, the model exhibits
excellent accuracy, which might be attributed to its testing on highly
separable data. Most of the current studies on data separability measures are
defined based on the distance between sample points, but this has been
demonstrated to fail in several circumstances. In this paper, we propose a new
separability measure--the rate of separability (RS), which is based on the data
coding rate. We validate its effectiveness as a supplement to the separability
measure by comparing it to four other distance-based measures on synthetic
datasets. Then, we demonstrate the positive correlation between the proposed
measure and recognition accuracy in a multi-task scenario constructed from a
real dataset. Finally, we discuss the methods for evaluating the classification
performance of machine learning and deep learning models considering data
separability.
Related papers
- Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Metric Learning Improves the Ability of Combinatorial Coverage Metrics
to Anticipate Classification Error [0.0]
Many machine learning methods are sensitive to test or operational data that is dissimilar to training data.
metric learning is a technique for learning latent spaces where data from different classes is further apart.
In a study of 6 open-source datasets, we find that metric learning increased the difference between set-difference coverage metrics calculated on correctly and incorrectly classified data.
arXiv Detail & Related papers (2023-02-28T14:55:57Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - Estimating Model Performance under Domain Shifts with Class-Specific
Confidence Scores [25.162667593654206]
We introduce class-wise calibration within the framework of performance estimation for imbalanced datasets.
We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets.
arXiv Detail & Related papers (2022-07-20T15:04:32Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Data-Centric Machine Learning in the Legal Domain [0.2624902795082451]
This paper explores how changes in a data set influence the measured performance of a model.
Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance.
The observed effects are surprisingly pronounced, especially when the per-class performance is considered.
arXiv Detail & Related papers (2022-01-17T23:05:14Z) - A Novel Intrinsic Measure of Data Separability [0.0]
In machine learning, the performance of a classifier depends on the separability/complexity of datasets.
We create an intrinsic measure -- the Distance-based Separability Index (DSI)
We show that the DSI can indicate whether the distributions of datasets are identical for any dimensionality.
arXiv Detail & Related papers (2021-09-11T04:20:08Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Data Separability for Neural Network Classifiers and the Development of
a Separability Index [17.49709034278995]
We created the Distance-based Separability Index (DSI) to measure the separability of datasets.
We show that the DSI can indicate whether data belonging to different classes have similar distributions.
We also discussed possible applications of the DSI in the fields of data science, machine learning, and deep learning.
arXiv Detail & Related papers (2020-05-27T01:49:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.