Fitting very flexible models: Linear regression with large numbers of
parameters
- URL: http://arxiv.org/abs/2101.07256v1
- Date: Fri, 15 Jan 2021 21:08:34 GMT
- Title: Fitting very flexible models: Linear regression with large numbers of
parameters
- Authors: David W. Hogg (NYU) and Soledad Villar (JHU)
- Abstract summary: Linear fitting is used to generalize and denoising data.
We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof.
It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a process.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are many uses for linear fitting; the context here is interpolation and
denoising of data, as when you have calibration data and you want to fit a
smooth, flexible function to those data. Or you want to fit a flexible function
to de-trend a time series or normalize a spectrum. In these contexts,
investigators often choose a polynomial basis, or a Fourier basis, or wavelets,
or something equally general. They also choose an order, or number of basis
functions to fit, and (often) some kind of regularization. We discuss how this
basis-function fitting is done, with ordinary least squares and extensions
thereof. We emphasize that it is often valuable to choose far more parameters
than data points, despite folk rules to the contrary: Suitably regularized
models with enormous numbers of parameters generalize well and make good
predictions for held-out data; over-fitting is not (mainly) a problem of having
too many parameters. It is even possible to take the limit of infinite
parameters, at which, if the basis and regularization are chosen correctly, the
least-squares fit becomes the mean of a Gaussian process. We recommend
cross-validation as a good empirical method for model selection (for example,
setting the number of parameters and the form of the regularization), and
jackknife resampling as a good empirical method for estimating the
uncertainties of the predictions made by the model. We also give advice for
building stable computational implementations.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - Optimal sampling for least-squares approximation [0.8702432681310399]
We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples.
We show how it can be used to construct sampling strategies that possess near-optimal sample complexity.
arXiv Detail & Related papers (2024-09-04T00:06:23Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data.
We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
arXiv Detail & Related papers (2023-11-27T16:39:55Z) - Conjugate priors for count and rounded data regression [0.0]
We introduce conjugate priors that enable closed-form posterior inference.
Key posterior and predictive functionals are computable analytically or via direct Monte Carlo simulation.
These tools are broadly useful for linear regression, nonlinear models via basis expansions, and model and variable selection.
arXiv Detail & Related papers (2021-10-23T23:26:01Z) - Spectral goodness-of-fit tests for complete and partial network data [1.7188280334580197]
We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data.
We show that our method, when applied to a specific model of interest, provides a straightforward, computationally fast way of selecting parameters.
Our method leads to improved community detection algorithms.
arXiv Detail & Related papers (2021-06-17T17:56:30Z) - A Universal Law of Robustness via Isoperimetry [1.484852576248587]
We show that smooth requires $d$ more parameters than mere, where $d$ is the ambient data dimension.
We prove this universal law of robustness for any smoothly parametrized function class with size weights.
arXiv Detail & Related papers (2021-05-26T19:49:47Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Flexible Bayesian Nonlinear Model Configuration [10.865434331546126]
Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response.
We introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models.
A genetically modified mode jumping chain Monte Carlo algorithm is adopted to perform Bayesian inference.
arXiv Detail & Related papers (2020-03-05T21:20:55Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.