Merging Two Cultures: Deep and Statistical Learning
- URL: http://arxiv.org/abs/2110.11561v1
- Date: Fri, 22 Oct 2021 02:57:21 GMT
- Title: Merging Two Cultures: Deep and Statistical Learning
- Authors: Anindya Bhadra, Jyotishka Datta, Nick Polson, Vadim Sokolov, Jianeng
Xu
- Abstract summary: Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data.
We show that prediction, optimisation and uncertainty can be achieved using probabilistic methods at the output layer of the model.
- Score: 3.15863303008255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Merging the two cultures of deep and statistical learning provides insights
into structured high-dimensional data. Traditional statistical modeling is
still a dominant strategy for structured tabular data. Deep learning can be
viewed through the lens of generalized linear models (GLMs) with composite link
functions. Sufficient dimensionality reduction (SDR) and sparsity performs
nonlinear feature engineering. We show that prediction, interpolation and
uncertainty quantification can be achieved using probabilistic methods at the
output layer of the model. Thus a general framework for machine learning arises
that first generates nonlinear features (a.k.a factors) via sparse
regularization and stochastic gradient optimisation and second uses a
stochastic output layer for predictive uncertainty. Rather than using shallow
additive architectures as in many statistical models, deep learning uses layers
of semi affine input transformations to provide a predictive rule. Applying
these layers of transformations leads to a set of attributes (a.k.a features)
to which predictive statistical methods can be applied. Thus we achieve the
best of both worlds: scalability and fast predictive rule construction together
with uncertainty quantification. Sparse regularisation with un-supervised or
supervised learning finds the features. We clarify the duality between shallow
and wide models such as PCA, PPR, RRR and deep but skinny architectures such as
autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is
of practical importance for finding good network architectures. By
incorporating probabilistic components at the output level we allow for
predictive uncertainty. For interpolation we use deep Gaussian process and ReLU
trees for classification. We provide applications to regression, classification
and interpolation. Finally, we conclude with directions for future research.
Related papers
- Deep Learning: A Tutorial [0.8158530638728498]
We provide a review of deep learning methods which provide insight into structured high-dimensional data.
Deep learning uses layers of semi-affine input transformations to provide a predictive rule.
Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied.
arXiv Detail & Related papers (2023-10-10T01:55:22Z) - What learning algorithm is in-context learning? Investigations with
linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly.
We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression.
Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - A Hybrid Framework for Sequential Data Prediction with End-to-End
Optimization [0.0]
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates hand-designed features and manual model selection issues.
We employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression.
We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets.
arXiv Detail & Related papers (2022-03-25T17:13:08Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Emulating Spatio-Temporal Realizations of Three-Dimensional Isotropic
Turbulence via Deep Sequence Learning Models [24.025975236316842]
We use a data-driven approach to model a three-dimensional turbulent flow using cutting-edge Deep Learning techniques.
The accuracy of the model is assessed using statistical and physics-based metrics.
arXiv Detail & Related papers (2021-12-07T03:33:39Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation.
We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Deep transformation models: Tackling complex regression problems with
neural network based transformation models [0.0]
We present a deep transformation model for probabilistic regression.
It estimates the whole conditional probability distribution, which is the most thorough way to capture uncertainty about the outcome.
Our method works for complex input data, which we demonstrate by employing a CNN architecture on image data.
arXiv Detail & Related papers (2020-04-01T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.