An Empirical Analysis of the Advantages of Finite- v.s. Infinite-Width
Bayesian Neural Networks
- URL: http://arxiv.org/abs/2211.09184v1
- Date: Wed, 16 Nov 2022 20:07:55 GMT
- Title: An Empirical Analysis of the Advantages of Finite- v.s. Infinite-Width
Bayesian Neural Networks
- Authors: Jiayu Yao, Yaniv Yacoby, Beau Coker, Weiwei Pan, Finale Doshi-Velez
- Abstract summary: We empirically compare finite- and infinite-width BNNs, and provide quantitative and qualitative explanations for their performance difference.
We find that when the model is mis-specified, increasing width can hurt BNN performance.
In these cases, we provide evidence that finite-width BNNs generalize better partially due to the properties of their frequency spectrum that allows them to adapt under model mismatch.
- Score: 25.135652514472238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comparing Bayesian neural networks (BNNs) with different widths is
challenging because, as the width increases, multiple model properties change
simultaneously, and, inference in the finite-width case is intractable. In this
work, we empirically compare finite- and infinite-width BNNs, and provide
quantitative and qualitative explanations for their performance difference. We
find that when the model is mis-specified, increasing width can hurt BNN
performance. In these cases, we provide evidence that finite-width BNNs
generalize better partially due to the properties of their frequency spectrum
that allows them to adapt under model mismatch.
Related papers
- Generalization of Graph Neural Networks is Robust to Model Mismatch [84.01980526069075]
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities.
In this paper, we examine GNNs that operate on geometric graphs generated from manifold models.
Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch.
arXiv Detail & Related papers (2024-08-25T16:00:44Z) - Disentangling, Amplifying, and Debiasing: Learning Disentangled Representations for Fair Graph Neural Networks [22.5976413484192]
We propose a novel GNN framework, DAB-GNN, that Disentangles, Amplifies, and deBiases attribute, structure, and potential biases in the GNN mechanism.
Dab-GNN significantly outperforms ten state-of-the-art competitors in terms of achieving an optimal balance between accuracy and fairness.
arXiv Detail & Related papers (2024-08-23T07:14:56Z) - Robust Learning in Bayesian Parallel Branching Graph Neural Networks: The Narrow Width Limit [4.373803477995854]
We investigate the narrow width limit of the Bayesian Parallel Branching Graph Neural Network (BPB-GNN)
We show that when the width of a BPB-GNN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning.
Our results characterize a newly defined narrow-width regime for parallel branching networks in general.
arXiv Detail & Related papers (2024-07-26T15:14:22Z) - Feature-Learning Networks Are Consistent Across Widths At Realistic
Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets.
Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training.
We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z) - Posterior Regularized Bayesian Neural Network Incorporating Soft and
Hard Knowledge Constraints [12.050265348673078]
We propose a novel Posterior-Regularized Bayesian Neural Network (PR-BNN) model by incorporating different types of knowledge constraints.
Experiments in simulation and two case studies about aviation landing prediction and solar energy output prediction have shown the knowledge constraints and the performance improvement of the proposed model.
arXiv Detail & Related papers (2022-10-16T18:58:50Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - EIGNN: Efficient Infinite-Depth Graph Neural Networks [51.97361378423152]
Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications.
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN)
We show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T08:16:58Z) - Asymptotics of Wide Convolutional Neural Networks [18.198962344790377]
We study scaling laws for wide CNNs and networks with skip connections.
We find that the difference in performance between finite and infinite width models vanishes at a definite rate with respect to model width.
arXiv Detail & Related papers (2020-08-19T21:22:19Z) - Belief Propagation Neural Networks [103.97004780313105]
We introduce belief propagation neural networks (BPNNs)
BPNNs operate on factor graphs and generalize Belief propagation (BP)
We show that BPNNs converges 1.7x faster on Ising models while providing tighter bounds.
On challenging model counting problems, BPNNs compute estimates 100's of times faster than state-of-the-art handcrafted methods.
arXiv Detail & Related papers (2020-07-01T07:39:51Z) - Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior.
For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z) - Compromise-free Bayesian neural networks [0.0]
We numerically sample the full (non-Gaussian and multimodal) network posterior and obtain numerical estimates of the Bayesian evidence.
Networks with $ReLU$ activation functions have consistently higher evidences than those with $tanh$ functions.
arXiv Detail & Related papers (2020-04-25T19:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.