Estimation of Predictive Performance in High-Dimensional Data Settings
using Learning Curves
- URL: http://arxiv.org/abs/2206.03825v1
- Date: Wed, 8 Jun 2022 11:48:01 GMT
- Title: Estimation of Predictive Performance in High-Dimensional Data Settings
using Learning Curves
- Authors: Jeroen M. Goedhart, Thomas Klausch, Mark A. van de Wiel
- Abstract summary: Learn2Evaluate is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size.
The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In high-dimensional prediction settings, it remains challenging to reliably
estimate the test performance. To address this challenge, a novel performance
estimation framework is presented. This framework, called Learn2Evaluate, is
based on learning curves by fitting a smooth monotone curve depicting test
performance as a function of the sample size. Learn2Evaluate has several
advantages compared to commonly applied performance estimation methodologies.
Firstly, a learning curve offers a graphical overview of a learner. This
overview assists in assessing the potential benefit of adding training samples
and it provides a more complete comparison between learners than performance
estimates at a fixed subsample size. Secondly, a learning curve facilitates in
estimating the performance at the total sample size rather than a subsample
size. Thirdly, Learn2Evaluate allows the computation of a theoretically
justified and useful lower confidence bound. Furthermore, this bound may be
tightened by performing a bias correction. The benefits of Learn2Evaluate are
illustrated by a simulation study and applications to omics data.
Related papers
- One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo)
MoSo aims to identify and remove the least informative samples from the training set.
Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z) - Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles [34.32021888691789]
We develop a theory of feature-bagging in noisy least-squares ridge ensembles.
We demonstrate that subsampling shifts the double-descent peak of a linear predictor.
We compare the performance of a feature-subsampling ensemble to a single linear predictor.
arXiv Detail & Related papers (2023-07-06T17:56:06Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Leveraging Angular Information Between Feature and Classifier for
Long-tailed Learning: A Prediction Reformulation Approach [90.77858044524544]
We reformulate the recognition probabilities through included angles without re-balancing the classifier weights.
Inspired by the performance improvement of the predictive form reformulation, we explore the different properties of this angular prediction.
Our method is able to obtain the best performance among peer methods without pretraining on CIFAR10/100-LT and ImageNet-LT.
arXiv Detail & Related papers (2022-12-03T07:52:48Z) - A Survey of Learning Curves with Bad Behavior: or How More Data Need Not
Lead to Better Performance [15.236871820889345]
Plotting a learner's generalization performance against a training set size results in a so-called learning curve.
We make the (ideal) learning curve concept precise and briefly discuss the aforementioned usages of such curves.
The larger part of this survey's focus is on learning curves that show that more data does not necessarily lead to better generalization performance.
arXiv Detail & Related papers (2022-11-25T12:36:52Z) - ProBoost: a Boosting Method for Probabilistic Classifiers [55.970609838687864]
ProBoost is a new boosting algorithm for probabilistic classifiers.
It uses the uncertainty of each training sample to determine the most challenging/uncertain ones.
It produces a sequence that progressively focuses on the samples found to have the highest uncertainty.
arXiv Detail & Related papers (2022-09-04T12:49:20Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - Learning Curves for Analysis of Deep Networks [23.968036672913392]
Learning curves can be used to select model parameters and extrapolate performance.
We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of different parameterizations.
arXiv Detail & Related papers (2020-10-21T14:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.