Analysis of the Relative Entropy Asymmetry in the Regularization of
Empirical Risk Minimization
- URL: http://arxiv.org/abs/2306.07123v1
- Date: Mon, 12 Jun 2023 13:56:28 GMT
- Title: Analysis of the Relative Entropy Asymmetry in the Regularization of
Empirical Risk Minimization
- Authors: Francisco Daunas, I\~naki Esnaola, Samir M. Perlaza, H. Vincent Poor
- Abstract summary: The effect of the relative entropy asymmetry is analyzed in the empirical risk minimization with relative entropy regularization (ERM-RER) problem.
A novel regularization is introduced, coined Type-II regularization, that allows for solutions to the ERM-RER problem with a support that extends outside the support of the reference measure.
- Score: 70.540936204654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The effect of the relative entropy asymmetry is analyzed in the empirical
risk minimization with relative entropy regularization (ERM-RER) problem. A
novel regularization is introduced, coined Type-II regularization, that allows
for solutions to the ERM-RER problem with a support that extends outside the
support of the reference measure. The solution to the new ERM-RER Type-II
problem is analytically characterized in terms of the Radon-Nikodym derivative
of the reference measure with respect to the solution. The analysis of the
solution unveils the following properties of relative entropy when it acts as a
regularizer in the ERM-RER problem: i) relative entropy forces the support of
the Type-II solution to collapse into the support of the reference measure,
which introduces a strong inductive bias that dominates the evidence provided
by the training data; ii) Type-II regularization is equivalent to classical
relative entropy regularization with an appropriate transformation of the
empirical risk function. Closed-form expressions of the expected empirical risk
as a function of the regularization parameters are provided.
Related papers
- Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization [45.935798913942904]
The effect of relative entropy asymmetry is analyzed in the context of empirical risk minimization.
By comparing the well-understood Type-I ERM-RER with Type-II ERM-RER, the effects of entropy asymmetry are highlighted.
It is shown that Type-II regularization is equivalent to Type-I regularization with an appropriate transformation of the empirical risk function.
arXiv Detail & Related papers (2024-10-02T09:43:43Z) - Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences [45.935798913942904]
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented.
Examples of the solution for particular choices of the function $f$ are presented.
arXiv Detail & Related papers (2024-02-01T11:12:00Z) - Empirical Risk Minimization with Relative Entropy Regularization [6.815730801645783]
The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated.
The solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure.
For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable.
arXiv Detail & Related papers (2022-11-12T09:41:02Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Empirical Risk Minimization with Relative Entropy Regularization:
Optimality and Sensitivity Analysis [7.953455469099826]
The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied.
The expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.
arXiv Detail & Related papers (2022-02-09T10:55:14Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Linear Convergence of Entropy-Regularized Natural Policy Gradient with
Linear Function Approximation [30.02577720946978]
We establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation.
We prove that entropy-regularized NPG exhibits emphlinear convergence up to a function approximation error.
arXiv Detail & Related papers (2021-06-08T04:30:39Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.