The Limits of Assumption-free Tests for Algorithm Performance
- URL: http://arxiv.org/abs/2402.07388v2
- Date: Sat, 2 Mar 2024 03:36:09 GMT
- Title: The Limits of Assumption-free Tests for Algorithm Performance
- Authors: Yuetian Luo and Rina Foygel Barber
- Abstract summary: How well does an algorithm perform at a given modeling task, and which algorithm performs best?
We make a distinction between two questions: how good is an algorithm $A$ at the problem of learning from a training set of size $n$, versus, how good is a particular fitted model produced by running $A$ on a particular training data set of size $n$?
- Score: 6.7171902258864655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algorithm evaluation and comparison are fundamental questions in machine
learning and statistics -- how well does an algorithm perform at a given
modeling task, and which algorithm performs best? Many methods have been
developed to assess algorithm performance, often based around cross-validation
type strategies, retraining the algorithm of interest on different subsets of
the data and assessing its performance on the held-out data points. Despite the
broad use of such procedures, the theoretical properties of these methods are
not yet fully understood. In this work, we explore some fundamental limits for
answering these questions with limited amounts of data. In particular, we make
a distinction between two questions: how good is an algorithm $A$ at the
problem of learning from a training set of size $n$, versus, how good is a
particular fitted model produced by running $A$ on a particular training data
set of size $n$?
Our main results prove that, for any test that treats the algorithm $A$ as a
``black box'' (i.e., we can only study the behavior of $A$ empirically), there
is a fundamental limit on our ability to carry out inference on the performance
of $A$, unless the number of available data points $N$ is many times larger
than the sample size $n$ of interest. (On the other hand, evaluating the
performance of a particular fitted model is easy as long as a holdout data set
is available -- that is, as long as $N-n$ is not too small.) We also ask
whether an assumption of algorithmic stability might be sufficient to
circumvent this hardness result. Surprisingly, we find that this is not the
case: the same hardness result still holds for the problem of evaluating the
performance of $A$, aside from a high-stability regime where fitted models are
essentially nonrandom. Finally, we also establish similar hardness results for
the problem of comparing multiple algorithms.
Related papers
- Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - Subset-Based Instance Optimality in Private Estimation [23.173651165908282]
We show how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties.
Our algorithm simultaneously matches or exceeds the performance of existing algorithms under a range of distributional assumptions.
arXiv Detail & Related papers (2023-03-01T18:49:10Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Asymptotic Instance-Optimal Algorithms for Interactive Decision Making [35.76184529520015]
In this paper, we design the first instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions.
Our results, instantiated on concrete problems, recover the classical gap-dependent bounds for multi-armed bandits and prior works on linear bandits.
arXiv Detail & Related papers (2022-06-06T02:56:10Z) - Exact Paired-Permutation Testing for Structured Test Statistics [67.71280539312536]
We provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics.
Our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
arXiv Detail & Related papers (2022-05-03T11:00:59Z) - Choosing the Right Algorithm With Hints From Complexity Theory [16.33500498939925]
We show that the Metropolis algorithm is clearly the best of all algorithms regarded for reasonable problem sizes.
An artificial algorithm of this type having an $O(n log n)$ runtime leads to the result that the significance-based compact genetic algorithm (sig-cGA) can solve the DLB problem in time $O(n log n)$ with high probability.
arXiv Detail & Related papers (2021-09-14T11:12:32Z) - An Empirical Comparison of Off-policy Prediction Learning Algorithms in
the Four Rooms Environment [9.207173776826403]
We empirically compare 11 off-policy prediction learning algorithms with linear function approximation on two small tasks.
The algorithms' performance is highly affected by the variance induced by the importance sampling ratios.
Emphatic TD($lambda$) tends to have lower error than other algorithms, but might learn more slowly in some cases.
arXiv Detail & Related papers (2021-09-10T21:15:41Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Learning Sparse Classifiers: Continuous and Mixed Integer Optimization
Perspectives [10.291482850329892]
Mixed integer programming (MIP) can be used to solve (to optimality) $ell_0$-regularized regression problems.
We propose two classes of scalable algorithms: an exact algorithm that can handlepapprox 50,000$ features in a few minutes, and approximate algorithms that can address instances with $papprox6$.
In addition, we present new estimation error bounds for $ell$-regularizeds.
arXiv Detail & Related papers (2020-01-17T18:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.