Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization
- URL: http://arxiv.org/abs/2410.02833v2
- Date: Wed, 9 Oct 2024 11:28:41 GMT
- Title: Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization
- Authors: Francisco Daunas, IƱaki Esnaola, Samir M. Perlaza, H. Vincent Poor,
- Abstract summary: The effect of relative entropy asymmetry is analyzed in the context of empirical risk minimization.
By comparing the well-understood Type-I ERM-RER with Type-II ERM-RER, the effects of entropy asymmetry are highlighted.
It is shown that Type-II regularization is equivalent to Type-I regularization with an appropriate transformation of the empirical risk function.
- Score: 45.935798913942904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The effect of relative entropy asymmetry is analyzed in the context of empirical risk minimization (ERM) with relative entropy regularization (ERM-RER). Two regularizations are considered: $(a)$ the relative entropy of the measure to be optimized with respect to a reference measure (Type-I ERM-RER); or $(b)$ the relative entropy of the reference measure with respect to the measure to be optimized (Type-II ERM-RER). The main result is the characterization of the solution to the Type-II ERM-RER problem and its key properties. By comparing the well-understood Type-I ERM-RER with Type-II ERM-RER, the effects of entropy asymmetry are highlighted. The analysis shows that in both cases, regularization by relative entropy forces the solution's support to collapse into the support of the reference measure, introducing a strong inductive bias that can overshadow the evidence provided by the training data. Finally, it is shown that Type-II regularization is equivalent to Type-I regularization with an appropriate transformation of the empirical risk function.
Related papers
- Ridge interpolators in correlated factor regression models -- exact risk analysis [0.0]
We consider correlated emphfactor regression models (FRM) and analyze the performance of classical ridge interpolators.
We provide emphexcess prediction risk characterizations that clearly show the dependence on all key model parameters.
arXiv Detail & Related papers (2024-06-13T14:46:08Z) - Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences [45.935798913942904]
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented.
Examples of the solution for particular choices of the function $f$ are presented.
arXiv Detail & Related papers (2024-02-01T11:12:00Z) - Analysis of the Relative Entropy Asymmetry in the Regularization of
Empirical Risk Minimization [70.540936204654]
The effect of the relative entropy asymmetry is analyzed in the empirical risk minimization with relative entropy regularization (ERM-RER) problem.
A novel regularization is introduced, coined Type-II regularization, that allows for solutions to the ERM-RER problem with a support that extends outside the support of the reference measure.
arXiv Detail & Related papers (2023-06-12T13:56:28Z) - Symmetry breaking slows convergence of the ADAPT Variational Quantum
Eigensolver [0.0]
We study the impact of symmetry breaking on the performance of ADAPT-VQE using two strongly correlated systems.
We analyze the role that symmetry breaking in the reference states and orbital mappings of the fermionic Hamiltonians have on the compactness and performance of ADAPT-VQE.
arXiv Detail & Related papers (2022-07-07T03:09:54Z) - Empirical Risk Minimization with Relative Entropy Regularization:
Optimality and Sensitivity Analysis [7.953455469099826]
The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied.
The expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.
arXiv Detail & Related papers (2022-02-09T10:55:14Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.