Related papers: Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient

Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient

URL: http://arxiv.org/abs/2410.22065v1
Date: Tue, 29 Oct 2024 14:23:42 GMT
Title: Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient
Authors: Vu C. Dinh, Lam Si Tung Ho, Cuong V. Nguyen,
Abstract summary: We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate. We then verify our theoretical findings through empirical simulations as well as experiments on a real-world dataset.
Score: 3.823356975862005
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We analyze the error rates of the Hamiltonian Monte Carlo algorithm with leapfrog integrator for Bayesian neural network inference. We show that due to the non-differentiability of activation functions in the ReLU family, leapfrog HMC for networks with these activation functions has a large local error rate of $\Omega(\epsilon)$ rather than the classical error rate of $O(\epsilon^3)$. This leads to a higher rejection rate of the proposals, making the method inefficient. We then verify our theoretical findings through empirical simulations as well as experiments on a real-world dataset that highlight the inefficiency of HMC inference on ReLU-based neural networks compared to analytical networks.

Related papers

Uncertainty propagation in feed-forward neural network models [3.987067170467799]
We develop new uncertainty propagation methods for feed-forward neural network architectures. We derive analytical expressions for the probability density function (PDF) of the neural network output. A key finding is that an appropriate linearization of the leaky ReLU activation function yields accurate statistical results.
arXiv Detail & Related papers (2025-03-27T00:16:36Z)
Deep Learning without Global Optimization by Random Fourier Neural Networks [0.0]
We introduce a new training algorithm for variety of deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers. It consistently attains the theoretical approximation rate for residual networks with complex exponential activation functions.
arXiv Detail & Related papers (2024-07-16T16:23:40Z)
SGD method for entropy error function with smoothing l0 regularization for neural networks [3.108634881604788]
entropy error function has been widely used in neural networks. We propose a novel entropy function with smoothing l0 regularization for feed-forward neural networks. Our work is novel as it enables neural networks to learn effectively, producing more accurate predictions.
arXiv Detail & Related papers (2024-05-28T19:54:26Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
One Simple Trick to Fix Your Bayesian Neural Network [0.7955313479061443]
We show that neural networks with ReLU activation function induce posteriors that are hard to fit with MFVI. We find that using Leaky ReLU activations leads to more Gaussian-like weight posteriors and achieves a lower expected calibration error (ECE) than its ReLU-based counterpart.
arXiv Detail & Related papers (2022-07-26T19:45:36Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z)
Estimation of the Mean Function of Functional Data via Deep Neural Networks [6.230751621285321]
We propose a deep neural network method to perform nonparametric regression for functional data. The proposed method is applied to analyze positron emission tomography images of patients with Alzheimer disease.
arXiv Detail & Related papers (2020-12-08T17:18:16Z)
Measurement error models: from nonparametric methods to deep neural networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models. We use a fully connected feed-forward neural network to approximate the regression function $f(x)$. We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.