Muddling Labels for Regularization, a novel approach to generalization
        - URL: http://arxiv.org/abs/2102.08769v1
- Date: Wed, 17 Feb 2021 14:02:30 GMT
- Title: Muddling Labels for Regularization, a novel approach to generalization
- Authors: Karim Lounici, Katia Meziani and Benjamin Riu
- Abstract summary: Generalization is a central problem in Machine Learning.
This paper introduces a novel approach to achieve generalization without any data splitting.
It is based on a new risk measure which directly quantifies a model's tendency to overfit.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Generalization is a central problem in Machine Learning. Indeed most
prediction methods require careful calibration of hyperparameters usually
carried out on a hold-out \textit{validation} dataset to achieve
generalization. The main goal of this paper is to introduce a novel approach to
achieve generalization without any data splitting, which is based on a new risk
measure which directly quantifies a model's tendency to overfit. To fully
understand the intuition and advantages of this new approach, we illustrate it
in the simple linear regression model ($Y=X\beta+\xi$) where we develop a new
criterion. We highlight how this criterion is a good proxy for the true
generalization risk. Next, we derive different procedures which tackle several
structures simultaneously (correlation, sparsity,...). Noticeably, these
procedures \textbf{concomitantly} train the model and calibrate the
hyperparameters. In addition, these procedures can be implemented via classical
gradient descent methods when the criterion is differentiable w.r.t. the
hyperparameters. Our numerical experiments reveal that our procedures are
computationally feasible and compare favorably to the popular approach (Ridge,
LASSO and Elastic-Net combined with grid-search cross-validation) in term of
generalization. They also outperform the baseline on two additional tasks:
estimation and support recovery of $\beta$. Moreover, our procedures do not
require any expertise for the calibration of the initial parameters which
remain the same for all the datasets we experimented on.
 
      
        Related papers
        - Methods with Local Steps and Random Reshuffling for Generally Smooth   Non-Convex Federated Optimization [52.61737731453222]
 Non-Machine Learning problems typically do not adhere to the standard smoothness assumption.
We propose and analyze new methods with local steps, partial participation of clients, and Random Random Reshuffling.
Our theory is consistent with the known results for standard smooth problems.
 arXiv  Detail & Related papers  (2024-12-03T19:20:56Z)
- A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
 We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
 arXiv  Detail & Related papers  (2024-06-10T12:25:13Z)
- Gradient-based bilevel optimization for multi-penalty Ridge regression
  through matrix differential calculus [0.46040036610482665]
 We introduce a gradient-based approach to the problem of linear regression with l2-regularization.
We show that our approach outperforms LASSO, Ridge, and Elastic Net regression.
The analytical of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation.
 arXiv  Detail & Related papers  (2023-11-23T20:03:51Z)
- Toward Theoretical Guidance for Two Common Questions in Practical
  Cross-Validation based Hyperparameter Selection [72.76113104079678]
 We show the first theoretical treatments of two common questions in cross-validation based hyperparameter selection.
We show that these generalizations can, respectively, always perform at least as well as always performing retraining or never performing retraining.
 arXiv  Detail & Related papers  (2023-01-12T16:37:12Z)
- Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
 Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
 arXiv  Detail & Related papers  (2022-11-02T16:39:42Z)
- HyperImpute: Generalized Iterative Imputation with Automatic Model
  Selection [77.86861638371926]
 We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
 arXiv  Detail & Related papers  (2022-06-15T19:10:35Z)
- Scalable Marginal Likelihood Estimation for Model Selection in Deep
  Learning [78.83598532168256]
 Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
 arXiv  Detail & Related papers  (2021-04-11T09:50:24Z)
- Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
  to Learn Robust and Invariant Representations [76.85274970052762]
 Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
 arXiv  Detail & Related papers  (2020-11-25T22:40:09Z)
- Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
 Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
 arXiv  Detail & Related papers  (2020-06-29T23:35:53Z)
- Optimizing generalization on the train set: a novel gradient-based
  framework to train parameters and hyperparameters simultaneously [0.0]
 Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
 arXiv  Detail & Related papers  (2020-06-11T18:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.