Fitting Elephants
- URL: http://arxiv.org/abs/2104.00526v1
- Date: Wed, 31 Mar 2021 05:50:39 GMT
- Title: Fitting Elephants
- Authors: Partha P Mitra
- Abstract summary: Modern machine learning (ML) approaches, cf. deep nets (DNNs), generalize well despite interpolating noisy data.
This may be understood via Statistically Interpolation (SCI)
SCI shows that the purely empirical approach can successfully predict.
However data does not provide theoretical insights, and the training data requirements may be prohibitive.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Textbook wisdom advocates for smooth function fits and implies that
interpolation of noisy data should lead to poor generalization. A related
heuristic is that fitting parameters should be fewer than measurements (Occam's
Razor). Surprisingly, contemporary machine learning (ML) approaches, cf. deep
nets (DNNs), generalize well despite interpolating noisy data. This may be
understood via Statistically Consistent Interpolation (SCI), i.e. data
interpolation techniques that generalize optimally for big data. In this
article we elucidate SCI using the weighted interpolating nearest neighbors
(wiNN) algorithm, which adds singular weight functions to kNN (k-nearest
neighbors). This shows that data interpolation can be a valid ML strategy for
big data. SCI clarifies the relation between two ways of modeling natural
phenomena: the rationalist approach (strong priors) of theoretical physics with
few parameters and the empiricist (weak priors) approach of modern ML with more
parameters than data. SCI shows that the purely empirical approach can
successfully predict. However data interpolation does not provide theoretical
insights, and the training data requirements may be prohibitive. Complex animal
brains are between these extremes, with many parameters, but modest training
data, and with prior structure encoded in species-specific mesoscale circuitry.
Thus, modern ML provides a distinct epistemological approach different both
from physical theories and animal brains.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data [22.262191225577244]
We explore whether a similar approach can be applied to scientific foundation models (SFMs)
We collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries.
We provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data.
arXiv Detail & Related papers (2024-10-09T00:52:00Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks.
Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Merging Two Cultures: Deep and Statistical Learning [3.15863303008255]
Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data.
We show that prediction, optimisation and uncertainty can be achieved using probabilistic methods at the output layer of the model.
arXiv Detail & Related papers (2021-10-22T02:57:21Z) - A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of
Overparameterized Machine Learning [37.01683478234978]
The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field.
One of the most important riddles is the good empirical generalization of over parameterized models.
arXiv Detail & Related papers (2021-09-06T10:48:40Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - A Universal Law of Robustness via Isoperimetry [1.484852576248587]
We show that smooth requires $d$ more parameters than mere, where $d$ is the ambient data dimension.
We prove this universal law of robustness for any smoothly parametrized function class with size weights.
arXiv Detail & Related papers (2021-05-26T19:49:47Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.