Related papers: Merging Two Cultures: Deep and Statistical Learning

Merging Two Cultures: Deep and Statistical Learning

URL: http://arxiv.org/abs/2110.11561v1
Date: Fri, 22 Oct 2021 02:57:21 GMT
Title: Merging Two Cultures: Deep and Statistical Learning
Authors: Anindya Bhadra, Jyotishka Datta, Nick Polson, Vadim Sokolov, Jianeng Xu
Abstract summary: Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. We show that prediction, optimisation and uncertainty can be achieved using probabilistic methods at the output layer of the model.
Score: 3.15863303008255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning uses layers of semi affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (a.k.a features) to which predictive statistical methods can be applied. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Sparse regularisation with un-supervised or supervised learning finds the features. We clarify the duality between shallow and wide models such as PCA, PPR, RRR and deep but skinny architectures such as autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level we allow for predictive uncertainty. For interpolation we use deep Gaussian process and ReLU trees for classification. We provide applications to regression, classification and interpolation. Finally, we conclude with directions for future research.

Related papers

LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Deep Learning: A Tutorial [0.8158530638728498]
We provide a review of deep learning methods which provide insight into structured high-dimensional data. Deep learning uses layers of semi-affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied.
arXiv Detail & Related papers (2023-10-10T01:55:22Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly. We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression. Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization [0.0]
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates hand-designed features and manual model selection issues. We employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets.
arXiv Detail & Related papers (2022-03-25T17:13:08Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Emulating Spatio-Temporal Realizations of Three-Dimensional Isotropic Turbulence via Deep Sequence Learning Models [24.025975236316842]
We use a data-driven approach to model a three-dimensional turbulent flow using cutting-edge Deep Learning techniques. The accuracy of the model is assessed using statistical and physics-based metrics.
arXiv Detail & Related papers (2021-12-07T03:33:39Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation. We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
Deep transformation models: Tackling complex regression problems with neural network based transformation models [0.0]
We present a deep transformation model for probabilistic regression. It estimates the whole conditional probability distribution, which is the most thorough way to capture uncertainty about the outcome. Our method works for complex input data, which we demonstrate by employing a CNN architecture on image data.
arXiv Detail & Related papers (2020-04-01T14:23:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.