Cost-Sensitive Evaluation for Binary Classifiers
- URL: http://arxiv.org/abs/2510.22016v1
- Date: Fri, 24 Oct 2025 20:34:18 GMT
- Title: Cost-Sensitive Evaluation for Binary Classifiers
- Authors: Pierangelo Lombardo, Antonio Casoli, Cristian Cingolani, Shola Oshodi, Michele Zanatta,
- Abstract summary: Weighted Accuracy (WA) is an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of the well-known accuracy metric.<n>We clarify the conceptual framework for handling class imbalance in cost-sensitive scenarios.
- Score: 0.013048920509133805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selecting an appropriate evaluation metric for classifiers is crucial for model comparison and parameter optimization, yet there is not consensus on a universally accepted metric that serves as a definitive standard. Moreover, there is often a misconception about the perceived need to mitigate imbalance in datasets used to train classification models. Since the final goal in classifier optimization is typically maximizing the return of investment or, equivalently, minimizing the Total Classification Cost (TCC), we define Weighted Accuracy (WA), an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of the well-known accuracy metric, coherent with the need of minimizing TCC. We clarify the conceptual framework for handling class imbalance in cost-sensitive scenarios, providing an alternative to rebalancing techniques. This framework can be applied to any metric that, like WA, can be expressed as a linear combination of example-dependent quantities and allows for comparing the results obtained in different datasets and for addressing discrepancies between the development dataset, used to train and validate the model, and the target dataset, where the model will be deployed. It also specifies in which scenarios using UCCs-unaware class rebalancing techniques or rebalancing metrics aligns with TCC minimization and when it is instead counterproductive. Finally, we propose a procedure to estimate the WA weight parameter in the absence of fully specified UCCs and demonstrate the robustness of WA by analyzing its correlation with TCC in example-dependent scenarios.
Related papers
- Principled Algorithms for Optimizing Generalized Metrics in Binary Classification [53.604375124674796]
We introduce principled algorithms for optimizing generalized metrics, supported by $H$-consistency and finite-sample generalization bounds.<n>Our approach reformulates metric optimization as a generalized cost-sensitive learning problem.<n>We develop new algorithms, METRO, with strong theoretical performance guarantees.
arXiv Detail & Related papers (2025-12-29T01:33:42Z) - CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF [7.945011337356916]
Cardinality estimation is a critical component of query optimization.<n>We propose CoLSE, a hybrid learned approach for single-table cardinality estimation.<n> Experimental results show that CoLSE achieves a favorable trade-off among accuracy, training time, latency, and model size, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-14T10:08:20Z) - Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets [6.673032375204486]
We propose SACP (Symmetric Aggregated Conformal Prediction), a novel method that aggregates nonconformity scores from multiple predictors.<n>SACP transforms these scores into e-values and combines them using any symmetric aggregation function.<n>We show that SACP consistently improves efficiency and often outperforms state-of-the-art model aggregation baselines.
arXiv Detail & Related papers (2025-12-07T17:54:07Z) - Concept Regions Matter: Benchmarking CLIP with a New Cluster-Importance Approach [20.898059440239603]
Cluster-based Concept Importance (CCI) is a novel interpretability method.<n>CCI sets a new state of the art on faithfulness benchmarks.<n>We present a comprehensive evaluation of eighteen CLIP variants.
arXiv Detail & Related papers (2025-11-17T05:01:24Z) - Explicit modelling of subject dependency in BCI decoding [12.17288254938554]
Brain-Computer Interfaces (BCIs) suffer from high inter-subject variability and limited labeled data.<n>We present an end-to-end approach that explicitly models the subject dependency using lightweight convolutional neural networks (CNNs) conditioned on the subject's identity.
arXiv Detail & Related papers (2025-09-27T10:51:42Z) - UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization [19.673388630963807]
We propose UniCBE, a unified uniformity-driven CBE framework.<n>On the AlpacaEval benchmark, UniCBE saves over 17% of evaluation budgets while achieving a Pearson correlation with ground truth exceeding 0.995.<n>In scenarios where new models are continuously introduced, UniCBE can even save over 50% of evaluation costs.
arXiv Detail & Related papers (2025-02-17T05:28:12Z) - Optimizing Class-Level Probability Reweighting Coefficients for Equitable Prompting Accuracy [12.287692969438169]
LLMs often uncover biases from pre-training data's statistical regularities.<n>This leads to persistent, uneven class accuracy in classification and QA.<n>We develop a post-hoc probability reweighting method that directly optimize for non-differentiable performance-driven metrics.
arXiv Detail & Related papers (2024-05-13T10:30:33Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Conceptually Diverse Base Model Selection for Meta-Learners in Concept
Drifting Data Streams [3.0938904602244355]
We present a novel approach for estimating the conceptual similarity of base models, which is calculated using the Principal Angles (PAs) between their underlying subspaces.
We evaluate these methods against thresholding using common ensemble pruning metrics, namely predictive performance and Mutual Information (MI) in the context of online Transfer Learning (TL)
Our results show that conceptual similarity thresholding has a reduced computational overhead, and yet yields comparable predictive performance to thresholding using predictive performance and MI.
arXiv Detail & Related papers (2021-11-29T13:18:53Z) - Re-Assessing the "Classify and Count" Quantification Method [88.60021378715636]
"Classify and Count" (CC) is often a biased estimator.
Previous works have failed to use properly optimised versions of CC.
We argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy.
arXiv Detail & Related papers (2020-11-04T21:47:39Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Improved Design of Quadratic Discriminant Analysis Classifier in
Unbalanced Settings [19.763768111774134]
quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended.
We propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias.
arXiv Detail & Related papers (2020-06-11T12:17:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.