Approximate Cross-Validation with Low-Rank Data in High Dimensions
- URL: http://arxiv.org/abs/2008.10547v1
- Date: Mon, 24 Aug 2020 16:34:05 GMT
- Title: Approximate Cross-Validation with Low-Rank Data in High Dimensions
- Authors: William T. Stephenson, Madeleine Udell, Tamara Broderick
- Abstract summary: Cross-validation is an important tool for model assessment.
ACV methods can lose both speed and accuracy in high dimensions unless sparsity structure is present in the data.
We develop a new algorithm for ACV that is fast and accurate in the presence of ALR data.
- Score: 35.74302895575951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many recent advances in machine learning are driven by a challenging
trifecta: large data size $N$; high dimensions; and expensive algorithms. In
this setting, cross-validation (CV) serves as an important tool for model
assessment. Recent advances in approximate cross validation (ACV) provide
accurate approximations to CV with only a single model fit, avoiding
traditional CV's requirement for repeated runs of expensive algorithms.
Unfortunately, these ACV methods can lose both speed and accuracy in high
dimensions -- unless sparsity structure is present in the data. Fortunately,
there is an alternative type of simplifying structure that is present in most
data: approximate low rank (ALR). Guided by this observation, we develop a new
algorithm for ACV that is fast and accurate in the presence of ALR data. Our
first key insight is that the Hessian matrix -- whose inverse forms the
computational bottleneck of existing ACV methods -- is ALR. We show that,
despite our use of the \emph{inverse} Hessian, a low-rank approximation using
the largest (rather than the smallest) matrix eigenvalues enables fast,
reliable ACV. Our second key insight is that, in the presence of ALR data,
error in existing ACV methods roughly grows with the (approximate, low) rank
rather than with the (full, high) dimension. These insights allow us to prove
theoretical guarantees on the quality of our proposed algorithm -- along with
fast-to-compute upper bounds on its error. We demonstrate the speed and
accuracy of our method, as well as the usefulness of our bounds, on a range of
real and simulated data sets.
Related papers
- Robust SVD Made Easy: A fast and reliable algorithm for large-scale data
analysis [0.0]
Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers.
This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation.
The proposed algorithm achieves remarkable speed by utilizing only two applications of a standard reduced-rank SVD algorithm.
arXiv Detail & Related papers (2024-02-15T07:08:11Z) - Blocked Cross-Validation: A Precise and Efficient Method for
Hyperparameter Tuning [0.0]
We introduce a novel approach called blocked cross-validation (BCV), where the repetitions are blocked with respect to both CV partition and the random behavior of the learner.
BCV provides more precise error estimates compared to RCV, even with a significantly reduced number of runs.
arXiv Detail & Related papers (2023-06-11T04:58:47Z) - Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR
Decomposition [77.4863142882136]
Cross-encoder models are prohibitively expensive for direct k-nearest neighbor (k-NN) search.
We propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors.
arXiv Detail & Related papers (2023-05-04T17:01:17Z) - Iterative Approximate Cross-Validation [13.084578404699174]
Cross-validation (CV) is one of the most popular tools for assessing and selecting predictive models.
In this paper, we propose a new paradigm to efficiently approximate CV when the empirical risk minimization (ERM) problem is solved via an iterative first-order algorithm.
Our new method extends existing guarantees for CV approximation to hold along the whole trajectory of the algorithm, including at convergence.
arXiv Detail & Related papers (2023-03-05T17:56:08Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Fast and Informative Model Selection using Learning Curve
Cross-Validation [2.28438857884398]
Cross-validation methods can be unnecessarily slow on large datasets.
We present a new approach for validation based on learning curves (LCCV)
LCCV iteratively increases the number of instances used for training.
arXiv Detail & Related papers (2021-11-27T14:48:52Z) - Highly Parallel Autoregressive Entity Linking with Discriminative
Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions.
Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - VSAC: Efficient and Accurate Estimator for H and F [68.65610177368617]
VSAC is a RANSAC-type robust estimator with a number of novelties.
It is significantly faster than all its predecessors and runs on average in 1-2 ms, on a CPU.
It is two orders of magnitude faster and yet as precise as MAGSAC++, the currently most accurate estimator of two-view geometry.
arXiv Detail & Related papers (2021-06-18T17:04:57Z) - Approximate Cross-Validation for Structured Models [20.79997929155929]
Gold standard evaluation technique is structured cross-validation (CV)
But CV here can be prohibitively slow due to the need to re-run already-expensive learning algorithms many times.
Previous work has shown approximate cross-validation (ACV) methods provide a fast and provably accurate alternative.
arXiv Detail & Related papers (2020-06-23T00:06:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.