Non-parametric Models for Non-negative Functions
- URL: http://arxiv.org/abs/2007.03926v1
- Date: Wed, 8 Jul 2020 07:17:28 GMT
- Title: Non-parametric Models for Non-negative Functions
- Authors: Ulysse Marteau-Ferey (PSL, DI-ENS, SIERRA), Francis Bach (PSL, DI-ENS,
SIERRA), Alessandro Rudi (PSL, DI-ENS, SIERRA)
- Abstract summary: We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
- Score: 48.7576911714538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear models have shown great effectiveness and flexibility in many fields
such as machine learning, signal processing and statistics. They can represent
rich spaces of functions while preserving the convexity of the optimization
problems where they are used, and are simple to evaluate, differentiate and
integrate. However, for modeling non-negative functions, which are crucial for
unsupervised learning, density estimation, or non-parametric Bayesian methods,
linear models are not applicable directly. Moreover, current state-of-the-art
models like generalized linear models either lead to non-convex optimization
problems, or cannot be easily integrated. In this paper we provide the first
model for non-negative functions which benefits from the same good properties
of linear models. In particular, we prove that it admits a representer theorem
and provide an efficient dual formulation for convex problems. We study its
representation power, showing that the resulting space of functions is strictly
richer than that of generalized linear models. Finally we extend the model and
the theoretical results to functions with outputs in convex cones. The paper is
complemented by an experimental evaluation of the model showing its
effectiveness in terms of formulation, algorithmic derivation and practical
results on the problems of density estimation, regression with heteroscedastic
errors, and multiple quantile regression.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Optimal Nonlinearities Improve Generalization Performance of Random
Features [0.9790236766474201]
Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors.
We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities.
Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
arXiv Detail & Related papers (2023-09-28T20:55:21Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Functional Nonlinear Learning [0.0]
We propose a functional nonlinear learning (FunNoL) method to represent multivariate functional data in a lower-dimensional feature space.
We show that FunNoL provides satisfactory curve classification and reconstruction regardless of data sparsity.
arXiv Detail & Related papers (2022-06-22T23:47:45Z) - Bias-variance decomposition of overparameterized regression with random
linear features [0.0]
"Over parameterized models" avoid overfitting even when the number of fit parameters is large enough to perfectly fit the training data.
We show how each transition arises due to small nonzero eigenvalues in the Hessian matrix.
We compare and contrast the phase diagram of the random linear features model to the random nonlinear features model and ordinary regression.
arXiv Detail & Related papers (2022-03-10T16:09:21Z) - PSD Representations for Effective Probability Models [117.35298398434628]
We show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end.
We characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees.
Our results open the way to applications of PSD models to density estimation, decision theory and inference.
arXiv Detail & Related papers (2021-06-30T15:13:39Z) - Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Dimension Independent Generalization Error by Stochastic Gradient
Descent [12.474236773219067]
We present a theory on the generalization error of descent (SGD) solutions for both and locally convex loss functions.
We show that the generalization error does not depend on the $p$ dimension or depends on the low effective $p$logarithmic factor.
arXiv Detail & Related papers (2020-03-25T03:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.