Related papers: Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights

Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights

URL: http://arxiv.org/abs/2002.04033v1
Date: Mon, 10 Feb 2020 07:19:52 GMT
Title: Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights
Authors: Theofanis Karaletsos, Thang D. Bui
Abstract summary: A desirable class of priors would represent weights compactly, capture correlations between weights, and allow inclusion of prior knowledge. This paper introduces two innovations: (i) a process-based hierarchical model for network weights based on unit embeddings that can flexibly encode correlated weight structures, and (ii) input-dependent versions of these weight priors that can provide convenient ways to regularize the function space. We show these models provide desirable test-time uncertainty estimates on out-of-distribution data, demonstrate cases of modeling inductive biases for neural networks with kernels, and demonstrate competitive predictive performance on an active learning benchmark
Score: 16.538973310830414
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Probabilistic neural networks are typically modeled with independent weight priors, which do not capture weight correlations in the prior and do not provide a parsimonious interface to express properties in function space. A desirable class of priors would represent weights compactly, capture correlations between weights, facilitate calibrated reasoning about uncertainty, and allow inclusion of prior knowledge about the function space such as periodicity or dependence on contexts such as inputs. To this end, this paper introduces two innovations: (i) a Gaussian process-based hierarchical model for network weights based on unit embeddings that can flexibly encode correlated weight structures, and (ii) input-dependent versions of these weight priors that can provide convenient ways to regularize the function space through the use of kernels defined on contextual inputs. We show these models provide desirable test-time uncertainty estimates on out-of-distribution data, demonstrate cases of modeling inductive biases for neural networks with kernels which help both interpolation and extrapolation from training data, and demonstrate competitive predictive performance on an active learning benchmark.

Related papers

Adjustment for Confounding using Pre-Trained Representations [2.916285040262091]
We investigate how latent features from pre-trained neural networks can be leveraged to adjust for sources of confounding.<n>We show that neural networks can achieve fast convergence rates by adapting to intrinsic notions of sparsity and dimension of the learning problem.
arXiv Detail & Related papers (2025-06-17T09:11:17Z)
Discovering uncertainty: Gaussian constitutive neural networks with correlated weights [0.0]
We introduce a more interpretable network with fewer parameters, simpler training, and the potential to discover correlated weights. Importantly, the discovered distributions of material parameters across a set of samples can serve as priors to discover better models for new samples with limited data.
arXiv Detail & Related papers (2025-03-16T22:34:16Z)
Nonuniform random feature models using derivative information [10.239175197655266]
We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated. We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus) We suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.
arXiv Detail & Related papers (2024-10-03T01:30:13Z)
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture. We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z)
Empowering Bayesian Neural Networks with Functional Priors through Anchored Ensembling for Mechanics Surrogate Modeling Applications [0.0]
We present a novel BNN training scheme based on anchored ensembling that can integrate a priori information available in the function space. The anchoring scheme makes use of low-rank correlations between NN parameters, learnt from pre-training to realizations of the functional prior. We also perform a study to demonstrate how correlations between NN weights, which are often neglected in existing BNN implementations, is critical to appropriately transfer knowledge between the function-space and parameter-space priors.
arXiv Detail & Related papers (2024-09-08T22:27:50Z)
SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction [3.406882192023597]
Accurate prediction of protein-ligand binding affinity is crucial for drug development. Traditional methods often fail to accurately model the complex's spatial information. We propose SPIN, a model that incorporates various inductive biases applicable to this task.
arXiv Detail & Related papers (2024-07-10T08:40:07Z)
Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory. We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z)
Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights. In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z)
Modeling Implicit Bias with Fuzzy Cognitive Maps [0.0]
This paper presents a Fuzzy Cognitive Map model to quantify implicit bias in structured datasets. We introduce a new reasoning mechanism equipped with a normalization-like transfer function that prevents neurons from saturating.
arXiv Detail & Related papers (2021-12-23T17:04:12Z)
Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties. We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference. We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.