Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values
for Multi-classifiers
- URL: http://arxiv.org/abs/2209.06408v3
- Date: Wed, 13 Mar 2024 03:10:25 GMT
- Title: Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values
for Multi-classifiers
- Authors: Yanyun Wang, Dehui Du, Yuanhao Liu
- Abstract summary: We propose a novel evaluation measure named Meta Pattern Concern Score.
We learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values.
Our measure can also be used to refine the model training by dynamically adjusting the learning rate.
- Score: 4.983066629141241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While advanced classifiers have been increasingly used in real-world
safety-critical applications, how to properly evaluate the black-box models
given specific human values remains a concern in the community. Such human
values include punishing error cases of different severity in varying degrees
and making compromises in general performance to reduce specific dangerous
cases. In this paper, we propose a novel evaluation measure named Meta Pattern
Concern Score based on the abstract representation of probabilistic prediction
and the adjustable threshold for the concession in prediction confidence, to
introduce the human values into multi-classifiers. Technically, we learn from
the advantages and disadvantages of two kinds of common metrics, namely the
confusion matrix-based evaluation measures and the loss values, so that our
measure is effective as them even under general tasks, and the cross entropy
loss becomes a special case of our measure in the limit. Besides, our measure
can also be used to refine the model training by dynamically adjusting the
learning rate. The experiments on four kinds of models and six datasets confirm
the effectiveness and efficiency of our measure. And a case study shows it can
not only find the ideal model reducing 0.53% of dangerous cases by only
sacrificing 0.04% of training accuracy, but also refine the learning rate to
train a new model averagely outperforming the original one with a 1.62% lower
value of itself and 0.36% fewer number of dangerous cases.
Related papers
- Calibrated Value-Aware Model Learning with Probabilistic Environment Models [11.633285935344208]
We analyze the family of value-aware model learning losses, which includes the popular MuZero loss.<n>We show that these losses, as normally used, are uncalibrated surrogate losses, which means that they do not always recover the correct model and value function.
arXiv Detail & Related papers (2025-05-28T18:40:49Z) - Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods [6.902279764206365]
We propose a novel approach to identify the at-risk samples using only artifacts available during training.
Our method analyzes individual per-sample loss traces and uses them to identify the vulnerable data samples.
arXiv Detail & Related papers (2024-11-08T18:04:41Z) - A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data [7.199059106376138]
This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data.
We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets.
arXiv Detail & Related papers (2024-06-06T14:13:38Z) - Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Tightening the Approximation Error of Adversarial Risk with Auto Loss
Function Search [12.263913626161155]
A common type of evaluation is to approximate the adversarial risk of a model as a robustness indicator.
We propose AutoLoss-AR, the first method for searching loss functions for tightening the error.
The results demonstrate the effectiveness of the proposed methods.
arXiv Detail & Related papers (2021-11-09T11:47:43Z) - Learning from Similarity-Confidence Data [94.94650350944377]
We investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data.
We propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate.
arXiv Detail & Related papers (2021-02-13T07:31:16Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks.
Despite being successful in practice, several problems in understanding performance of adversarial training remain open.
We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z) - Rethinking Empirical Evaluation of Adversarial Robustness Using
First-Order Attack Methods [6.531546527140473]
We identify three common cases that lead to overestimation of adversarial accuracy against bounded first-order attack methods.
We propose compensation methods that address sources of inaccurate gradient computation.
Overall, our work shows that overestimated adversarial accuracy that is not indicative of robustness is prevalent even for conventionally trained deep neural networks.
arXiv Detail & Related papers (2020-06-01T22:55:09Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z) - Uncertainty-Gated Stochastic Sequential Model for EHR Mortality
Prediction [6.170898159041278]
We present a novel variational recurrent network that estimates the distribution of missing variables, updates hidden states, and predicts the possibility of in-hospital mortality.
It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner.
arXiv Detail & Related papers (2020-03-02T04:41:28Z) - Precise Tradeoffs in Adversarial Training for Linear Regression [55.764306209771405]
We provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features.
We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach.
Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
arXiv Detail & Related papers (2020-02-24T19:01:47Z) - Orthogonal Statistical Learning [49.55515683387805]
We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk depends on an unknown nuisance parameter.
We show that if the population risk satisfies a condition called Neymanity, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order.
arXiv Detail & Related papers (2019-01-25T02:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.