Beyond Tikhonov: Faster Learning with Self-Concordant Losses via
Iterative Regularization
- URL: http://arxiv.org/abs/2106.08855v1
- Date: Wed, 16 Jun 2021 15:25:41 GMT
- Title: Beyond Tikhonov: Faster Learning with Self-Concordant Losses via
Iterative Regularization
- Authors: Gaspard Beugnot, Julien Mairal, Alessandro Rudi
- Abstract summary: We extend the theory of Tikhonov regularization to generalized self concordant loss functions.
We show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme.
- Score: 120.31448970413298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The theory of spectral filtering is a remarkable tool to understand the
statistical properties of learning with kernels. For least squares, it allows
to derive various regularization schemes that yield faster convergence rates of
the excess risk than with Tikhonov regularization. This is typically achieved
by leveraging classical assumptions called source and capacity conditions,
which characterize the difficulty of the learning task. In order to understand
estimators derived from other loss functions, Marteau-Ferey et al. have
extended the theory of Tikhonov regularization to generalized self concordant
loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we
go a step further and show that fast and optimal rates can be achieved for GSC
by using the iterated Tikhonov regularization scheme, which is intrinsically
related to the proximal point method in optimization, and overcomes the
limitation of the classical Tikhonov regularization.
Related papers
- Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation [12.455342327482223]
We present a generalised formulation of convergent regularisation in terms of critical points.
We show that this is achieved by a class of weakly convex regularisers.
Applying this theory to learned regularisation, we prove universal approximation for input weakly convex neural networks.
arXiv Detail & Related papers (2024-02-01T22:54:45Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Hypothesis Transfer Learning with Surrogate Classification Losses:
Generalization Bounds through Algorithmic Stability [3.908842679355255]
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target.
This paper studies the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis.
arXiv Detail & Related papers (2023-05-31T09:38:21Z) - Small noise analysis for Tikhonov and RKHS regularizations [0.8133739801185272]
We establish a small noise analysis framework to assess the effects of norms in Tikhonov and RKHS regularizations.
This framework studies the convergence rates of regularized estimators in the small noise limit and reveals the potential instability of the conventional L2-regularizer.
A surprising insight is that over-smoothing via these fractional RKHSs consistently yields optimal convergence rates.
arXiv Detail & Related papers (2023-05-18T15:50:33Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - From inexact optimization to learning via gradient concentration [22.152317081922437]
In this paper, we investigate the phenomenon in the context of linear models with smooth loss functions.
We propose a proof technique combining ideas from inexact optimization and probability theory, specifically gradient concentration.
arXiv Detail & Related papers (2021-06-09T21:23:29Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.