GRASP: A Goodness-of-Fit Test for Classification Learning
- URL: http://arxiv.org/abs/2209.02064v2
- Date: Wed, 30 Aug 2023 22:48:02 GMT
- Title: GRASP: A Goodness-of-Fit Test for Classification Learning
- Authors: Adel Javanmard and Mohammad Mehrabi
- Abstract summary: Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$)
Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries.
- Score: 8.122270502556374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance of classifiers is often measured in terms of average accuracy on
test data. Despite being a standard measure, average accuracy fails in
characterizing the fit of the model to the underlying conditional law of labels
given the features vector ($Y|X$), e.g. due to model misspecification, over
fitting, and high-dimensionality. In this paper, we consider the fundamental
problem of assessing the goodness-of-fit for a general binary classifier. Our
framework does not make any parametric assumption on the conditional law $Y|X$,
and treats that as a black box oracle model which can be accessed only through
queries. We formulate the goodness-of-fit assessment problem as a tolerance
hypothesis testing of the form \[ H_0: \mathbb{E}\Big[D_f\Big({\sf
Bern}(\eta(X))\|{\sf Bern}(\hat{\eta}(X))\Big)\Big]\leq \tau\,, \] where $D_f$
represents an $f$-divergence function, and $\eta(x)$, $\hat{\eta}(x)$
respectively denote the true and an estimate likelihood for a feature vector
$x$ admitting a positive label. We propose a novel test, called \grasp for
testing $H_0$, which works in finite sample settings, no matter the features
(distribution-free). We also propose model-X \grasp designed for model-X
settings where the joint distribution of the features vector is known. Model-X
\grasp uses this distributional information to achieve better power. We
evaluate the performance of our tests through extensive numerical experiments.
Related papers
- One-Bit Quantization and Sparsification for Multiclass Linear
Classification via Regularized Regression [20.710343135282116]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$.
It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - A Spectral Approach to Item Response Theory [6.5268245109828005]
We propose a emphnew item estimation algorithm for the Rasch model.
The core of our algorithm is the computation of the stationary distribution of a Markov chain defined on an item-item graph.
Experiments on synthetic and real-life datasets show that our algorithm is scalable, accurate, and competitive with the most commonly used methods in the literature.
arXiv Detail & Related papers (2022-10-09T18:57:08Z) - Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture
Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model.
We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z) - Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models.
We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Minimax Optimal Regression over Sobolev Spaces via Laplacian
Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression.
We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.