Related papers: Fitting very flexible models: Linear regression with large numbers of parameters

Fitting very flexible models: Linear regression with large numbers of parameters

URL: http://arxiv.org/abs/2101.07256v1
Date: Fri, 15 Jan 2021 21:08:34 GMT
Title: Fitting very flexible models: Linear regression with large numbers of parameters
Authors: David W. Hogg (NYU) and Soledad Villar (JHU)
Abstract summary: Linear fitting is used to generalize and denoising data. We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof. It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a process.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There are many uses for linear fitting; the context here is interpolation and denoising of data, as when you have calibration data and you want to fit a smooth, flexible function to those data. Or you want to fit a flexible function to de-trend a time series or normalize a spectrum. In these contexts, investigators often choose a polynomial basis, or a Fourier basis, or wavelets, or something equally general. They also choose an order, or number of basis functions to fit, and (often) some kind of regularization. We discuss how this basis-function fitting is done, with ordinary least squares and extensions thereof. We emphasize that it is often valuable to choose far more parameters than data points, despite folk rules to the contrary: Suitably regularized models with enormous numbers of parameters generalize well and make good predictions for held-out data; over-fitting is not (mainly) a problem of having too many parameters. It is even possible to take the limit of infinite parameters, at which, if the basis and regularization are chosen correctly, the least-squares fit becomes the mean of a Gaussian process. We recommend cross-validation as a good empirical method for model selection (for example, setting the number of parameters and the form of the regularization), and jackknife resampling as a good empirical method for estimating the uncertainties of the predictions made by the model. We also give advice for building stable computational implementations.

Related papers

An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z)
Optimal sampling for least-squares approximation [0.8702432681310399]
We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples. We show how it can be used to construct sampling strategies that possess near-optimal sample complexity.
arXiv Detail & Related papers (2024-09-04T00:06:23Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
arXiv Detail & Related papers (2023-11-27T16:39:55Z)
Conjugate priors for count and rounded data regression [0.0]
We introduce conjugate priors that enable closed-form posterior inference. Key posterior and predictive functionals are computable analytically or via direct Monte Carlo simulation. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and model and variable selection.
arXiv Detail & Related papers (2021-10-23T23:26:01Z)
Spectral goodness-of-fit tests for complete and partial network data [1.7188280334580197]
We use recent results in random matrix theory to derive a general goodness-of-fit test for dyadic data. We show that our method, when applied to a specific model of interest, provides a straightforward, computationally fast way of selecting parameters. Our method leads to improved community detection algorithms.
arXiv Detail & Related papers (2021-06-17T17:56:30Z)
A Universal Law of Robustness via Isoperimetry [1.484852576248587]
We show that smooth requires $d$ more parameters than mere, where $d$ is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with size weights.
arXiv Detail & Related papers (2021-05-26T19:49:47Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
Flexible Bayesian Nonlinear Model Configuration [10.865434331546126]
Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. We introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models. A genetically modified mode jumping chain Monte Carlo algorithm is adopted to perform Bayesian inference.
arXiv Detail & Related papers (2020-03-05T21:20:55Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.