Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKA
- URL: http://arxiv.org/abs/2411.00259v1
- Date: Thu, 31 Oct 2024 23:33:23 GMT
- Title: Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKA
- Authors: David Smerkous, Qinxun Bai, Fuxin Li,
- Abstract summary: Centered Kernel Alignment (CKA) on feature kernels has been proposed to compare deep networks but has not been used as an optimization objective in Bayesian deep learning.
We propose adopting the approach of hyperspherical energy (HE) on top of CKA kernels to address this drawback and improve training stability.
- Score: 20.463933005601003
- License:
- Abstract: Particle-based Bayesian deep learning often requires a similarity metric to compare two networks. However, naive similarity metrics lack permutation invariance and are inappropriate for comparing networks. Centered Kernel Alignment (CKA) on feature kernels has been proposed to compare deep networks but has not been used as an optimization objective in Bayesian deep learning. In this paper, we explore the use of CKA in Bayesian deep learning to generate diverse ensembles and hypernetworks that output a network posterior. Noting that CKA projects kernels onto a unit hypersphere and that directly optimizing the CKA objective leads to diminishing gradients when two networks are very similar. We propose adopting the approach of hyperspherical energy (HE) on top of CKA kernels to address this drawback and improve training stability. Additionally, by leveraging CKA-based feature kernels, we derive feature repulsive terms applied to synthetically generated outlier examples. Experiments on both diverse ensembles and hypernetworks show that our approach significantly outperforms baselines in terms of uncertainty quantification in both synthetic and realistic outlier detection tasks.
Related papers
- Learning Sparse Neural Networks with Identity Layers [33.11654855515443]
We investigate the intrinsic link between network sparsity and interlayer feature similarity.
We propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR.
We find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods.
arXiv Detail & Related papers (2023-07-14T14:58:44Z) - Binarizing Sparse Convolutional Networks for Efficient Point Cloud
Analysis [93.55896765176414]
We propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis.
We employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution.
Our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods.
arXiv Detail & Related papers (2023-03-27T13:47:06Z) - Learning Representation for Bayesian Optimization with Collision-free
Regularization [13.476552258272402]
Large-scale, high-dimensional, and non-stationary datasets are common in real-world scenarios.
Recent works attempt to handle such input by applying neural networks ahead of the classical Gaussian process to learn a latent representation.
We show that even with proper network design, such learned representation often leads to collision in the latent space.
We propose LOCo, an efficient deep Bayesian optimization framework which employs a novel regularizer to reduce the collision in the learned latent space.
arXiv Detail & Related papers (2022-03-16T14:44:16Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Semiparametric Bayesian Networks [5.205440005969871]
We introduce semiparametric Bayesian networks that combine parametric and nonparametric conditional probability distributions.
Their aim is to incorporate the bounded complexity of parametric models and the flexibility of nonparametric ones.
arXiv Detail & Related papers (2021-09-07T11:47:32Z) - Consistency of random-walk based network embedding algorithms [13.214230533788932]
We study the node2vec and DeepWalk algorithms through the perspective of matrix factorization.
Our results indicate a subtle interplay between the sparsity of the observed networks, the window sizes of the random walks, and the convergence rates of the node2vec/DeepWalk embedding.
arXiv Detail & Related papers (2021-01-18T22:49:22Z) - Second-Order Provable Defenses against Adversarial Attacks [63.34032156196848]
We show that if the eigenvalues of the network are bounded, we can compute a certificate in the $l$ norm efficiently using convex optimization.
We achieve certified accuracy of 5.78%, and 44.96%, and 43.19% on 2,59% and 4BP-based methods respectively.
arXiv Detail & Related papers (2020-06-01T05:55:18Z) - Recent Developments Combining Ensemble Smoother and Deep Generative
Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models.
We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.