Early stopping by correlating online indicators in neural networks
- URL: http://arxiv.org/abs/2402.02513v1
- Date: Sun, 4 Feb 2024 14:57:20 GMT
- Title: Early stopping by correlating online indicators in neural networks
- Authors: Manuel Vilares Ferro, Yerai Doval Mosquera, Francisco J. Ribadas Pena,
Victor M. Darriba Bilbao
- Abstract summary: We propose a novel technique to identify overfitting phenomena when training the learner.
Our proposal exploits the correlation over time in a collection of online indicators.
As opposed to previous approaches focused on a single criterion, we take advantage of subsidiarities between independent assessments.
- Score: 0.24578723416255746
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In order to minimize the generalization error in neural networks, a novel
technique to identify overfitting phenomena when training the learner is
formally introduced. This enables support of a reliable and trustworthy early
stopping condition, thus improving the predictive power of that type of
modeling. Our proposal exploits the correlation over time in a collection of
online indicators, namely characteristic functions for indicating if a set of
hypotheses are met, associated with a range of independent stopping conditions
built from a canary judgment to evaluate the presence of overfitting. That way,
we provide a formal basis for decision making in terms of interrupting the
learning process.
As opposed to previous approaches focused on a single criterion, we take
advantage of subsidiarities between independent assessments, thus seeking both
a wider operating range and greater diagnostic reliability. With a view to
illustrating the effectiveness of the halting condition described, we choose to
work in the sphere of natural language processing, an operational continuum
increasingly based on machine learning. As a case study, we focus on parser
generation, one of the most demanding and complex tasks in the domain. The
selection of cross-validation as a canary function enables an actual comparison
with the most representative early stopping conditions based on overfitting
identification, pointing to a promising start toward an optimal bias and
variance control.
Related papers
- Adaptive Cascading Network for Continual Test-Time Adaptation [12.718826132518577]
We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time.
Existing methods on test-time training suffer from several limitations.
arXiv Detail & Related papers (2024-07-17T01:12:57Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Large Class Separation is not what you need for Relational
Reasoning-based OOD Detection [12.578844450586]
Out-Of-Distribution (OOD) detection methods provide a solution by identifying semantic novelty.
Most of these methods leverage a learning stage on the known data, which means training (or fine-tuning) a model to capture the concept of normality.
A viable alternative is that of evaluating similarities in the embedding space produced by large pre-trained models without any further learning effort.
arXiv Detail & Related papers (2023-07-12T14:10:15Z) - When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous
Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders.
We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z) - Deep Learning Methods for Proximal Inference via Maximum Moment
Restriction [0.0]
We introduce a flexible and scalable method based on a deep neural network to estimate causal effects in the presence of unmeasured confounding.
Our method achieves state of the art performance on two well-established proximal inference benchmarks.
arXiv Detail & Related papers (2022-05-19T19:51:42Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Causal Modeling with Stochastic Confounders [11.881081802491183]
This work extends causal inference with confounders.
We propose a new approach to variational estimation for causal inference based on a representer theorem with a random input space.
arXiv Detail & Related papers (2020-04-24T00:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.