Related papers: Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing

Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing

URL: http://arxiv.org/abs/2508.01065v1
Date: Fri, 01 Aug 2025 20:51:32 GMT
Title: Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing
Authors: Paul N. Patrone, Anthony J. Kearsley,
Abstract summary: We show how two main tasks in diagnostics can be recast in terms of a variation on the confusion (or error) matrix $boldsymbol rm P$.<n>We show that the largest Gershgorin radius $boldsymbol rho_m$ of the matrix $mathbb I-boldsymbol rm P$ yields uniform error bounds for both classification and prevalence estimation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Motivated by canonical problems in medical diagnostics, we propose and study properties of an objective function that uniformly bounds uncertainties in quantities of interest extracted from classifiers and related data analysis tools. We begin by adopting a set-theoretic perspective to show how two main tasks in diagnostics -- classification and prevalence estimation -- can be recast in terms of a variation on the confusion (or error) matrix ${\boldsymbol {\rm P}}$ typically considered in supervised learning. We then combine arguments from conditional probability with the Gershgorin circle theorem to demonstrate that the largest Gershgorin radius $\boldsymbol \rho_m$ of the matrix $\mathbb I-\boldsymbol {\rm P}$ (where $\mathbb I$ is the identity) yields uniform error bounds for both classification and prevalence estimation. In a two-class setting, $\boldsymbol \rho_m$ is minimized via a measure-theoretic ``water-leveling'' argument that optimizes an appropriately defined partition $U$ generating the matrix ${\boldsymbol {\rm P}}$. We also consider an example that illustrates the difficulty of generalizing the binary solution to a multi-class setting and deduce relevant properties of the confusion matrix.

Related papers

Classification by Separating Hypersurfaces: An Entropic Approach [0.0]
We consider the classification problem of individuals characterized by a set of attributes represented as a vector in $mathbb RN$.<n>The goal is to find a hyperplane in $mathbb RN$ that separates two sets of points corresponding to two distinct classes.<n>We propose a novel approach by searching for a vector of parameters in a bounded $N-dimensional hypercube.
arXiv Detail & Related papers (2025-07-03T15:43:54Z)
Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z)
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.<n>This paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective.
arXiv Detail & Related papers (2024-07-15T07:11:44Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Universality of max-margin classifiers [10.797131009370219]
We study the role of featurization maps and the high-dimensional universality of the misclassification error for non-Gaussian features. In particular, the overparametrization threshold and generalization error can be computed within a simpler model.
arXiv Detail & Related papers (2023-09-29T22:45:56Z)
Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem. We characterize the implicit bias of 1-layer transformers optimized with gradient descent. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z)
The Hypervolume Indicator Hessian Matrix: Analytical Expression, Computational Time Complexity, and Sparsity [4.523133864190258]
This paper establishes the analytical expression of the Hessian matrix of the mapping from a (fixed size) collection of $n$ points in the $d$-dimensional decision space to the scalar hypervolume indicator value. The Hessian matrix plays a crucial role in second-order methods, such as the Newton-Raphson optimization method, and it can be used for the verification of local optimal sets.
arXiv Detail & Related papers (2022-11-08T11:24:18Z)
Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints. The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution. We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z)
Classification of high-dimensional data with spiked covariance matrix structure [0.5156484100374059]
We study the classification problem for high-dimensional data with $n$ observations on $p$ features.<n>We propose an adaptive classifier that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space.<n>We show that the resulting classifier is Bayes optimal whenever $n rightarrow infty$ and $s sqrtn-1 ln p rightarrow 0$.
arXiv Detail & Related papers (2021-10-05T11:26:53Z)
Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation. We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z)
Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution. We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate. Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z)
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data. Under a class of statistical models, we provide an exact analysis of the universality error of boosting. We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime [11.252856459394854]
Modern machine learning classifiers often exhibit vanishing classification error on the training set. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data.
arXiv Detail & Related papers (2019-11-05T00:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.