Related papers: Multi-domain performance analysis with scores tailored to user preferences

Multi-domain performance analysis with scores tailored to user preferences

URL: http://arxiv.org/abs/2512.08715v1
Date: Tue, 09 Dec 2025 15:29:53 GMT
Title: Multi-domain performance analysis with scores tailored to user preferences
Authors: Sébastien Piérard, Adrien Deliège, Marc Van Droogenbroeck,
Abstract summary: We consider a performance as a probability measure (e.g., a normalized confusion matrix for a classification task)<n>It appears that the corresponding weighted mean is known to be the summarization, and that only some remarkable scores assign to the summarized performance a value equal to a weighted arithmetic mean.<n>We rigorously define four domains, named easiest, most difficult, preponderant, and bottleneck domains, as functions of user preferences.
Score: 17.215680052668244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of algorithms, methods, and models tends to depend heavily on the distribution of cases on which they are applied, this distribution being specific to the applicative domain. After performing an evaluation in several domains, it is highly informative to compute a (weighted) mean performance and, as shown in this paper, to scrutinize what happens during this averaging. To achieve this goal, we adopt a probabilistic framework and consider a performance as a probability measure (e.g., a normalized confusion matrix for a classification task). It appears that the corresponding weighted mean is known to be the summarization, and that only some remarkable scores assign to the summarized performance a value equal to a weighted arithmetic mean of the values assigned to the domain-specific performances. These scores include the family of ranking scores, a continuum parameterized by user preferences, and that the weights to consider in the arithmetic mean depend on the user preferences. Based on this, we rigorously define four domains, named easiest, most difficult, preponderant, and bottleneck domains, as functions of user preferences. After establishing the theory in a general setting, regardless of the task, we develop new visual tools for two-class classification.

Related papers

Outperformance Score: A Universal Standardization Method for Confusion-Matrix-Based Classification Performance Metrics [1.5186937600119894]
We introduce the outperformance score function, a universal standardization method for confusion-matrix-based classification performance metrics.<n>The outperformance score represents the percentile rank of the observed classification performance within a reference distribution of possible performances.
arXiv Detail & Related papers (2025-05-11T16:07:14Z)
Foundations of the Theory of Performance-Based Ranking [10.89980029564174]
We establish the foundations of a universal theory for performance-based ranking.<n>A universal parametric family of scores, called ranking scores, can be used to establish rankings satisfying our axioms.<n>We show, in the case of two-class classification, that the family of ranking scores encompasses well-known performance scores.
arXiv Detail & Related papers (2024-12-05T15:05:25Z)
Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z)
An Upper Bound for the Distribution Overlap Index and Its Applications [22.92968284023414]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.<n>The proposed bound shows its value in one-class classification and domain shift analysis.<n>Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z)
Gradient Matching for Domain Generalization [93.04545793814486]
A critical requirement of machine learning systems is their ability to generalize to unseen domains. We propose an inter-domain gradient matching objective that targets domain generalization. We derive a simpler first-order algorithm named Fish that approximates its optimization.
arXiv Detail & Related papers (2021-04-20T12:55:37Z)
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z)
Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation [74.71931918541748]
We propose an instance affinity based criterion for source to target transfer during adaptation, called ILA-DA. We first propose a reliable and efficient method to extract similar and dissimilar samples across source and target, and utilize a multi-sample contrastive loss to drive the domain alignment process. We verify the effectiveness of ILA-DA by observing consistent improvements in accuracy over popular domain adaptation approaches on a variety of benchmark datasets.
arXiv Detail & Related papers (2021-04-03T01:33:14Z)
Optimizing Black-box Metrics with Iterative Example Weighting [32.682652530189266]
We consider learning to optimize a classification metric defined by a black-box function of the confusion matrix. Our approach is to adaptively learn example weights on the training dataset such that the resulting weighted objective best approximates the metric on the validation sample.
arXiv Detail & Related papers (2021-02-18T17:19:09Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)
A General Method for Robust Learning from Batches [56.59844655107251]
We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. We derive the first robust computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
arXiv Detail & Related papers (2020-02-25T18:53:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.