Input-gradient space particle inference for neural network ensembles
- URL: http://arxiv.org/abs/2306.02775v3
- Date: Tue, 5 Mar 2024 16:44:43 GMT
- Title: Input-gradient space particle inference for neural network ensembles
- Authors: Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski
- Abstract summary: First-order Repulsive Deep Ensemble (FoRDE) is an ensemble learning method based on ParVI.
Experiments on image classification datasets and transfer learning tasks show that FoRDE significantly outperforms the gold-standard DEs.
- Score: 32.64178604645513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Ensembles (DEs) demonstrate improved accuracy, calibration and
robustness to perturbations over single neural networks partly due to their
functional diversity. Particle-based variational inference (ParVI) methods
enhance diversity by formalizing a repulsion term based on a network similarity
kernel. However, weight-space repulsion is inefficient due to
over-parameterization, while direct function-space repulsion has been found to
produce little improvement over DEs. To sidestep these difficulties, we propose
First-order Repulsive Deep Ensemble (FoRDE), an ensemble learning method based
on ParVI, which performs repulsion in the space of first-order input gradients.
As input gradients uniquely characterize a function up to translation and are
much smaller in dimension than the weights, this method guarantees that
ensemble members are functionally different. Intuitively, diversifying the
input gradients encourages each network to learn different features, which is
expected to improve the robustness of an ensemble. Experiments on image
classification datasets and transfer learning tasks show that FoRDE
significantly outperforms the gold-standard DEs and other ensemble methods in
accuracy and calibration under covariate shift due to input perturbations.
Related papers
- Diffusion-PINN Sampler [6.656265182236135]
We introduce a novel diffusion-based sampling algorithm that estimates the drift term by solving the governing partial differential equation of the log-density of the underlying SDE marginals via physics-informed neural networks (PINN)
We prove that the error of log-density approximation can be controlled by the PINN residual loss, enabling us to establish convergence guarantees of DPS.
arXiv Detail & Related papers (2024-10-20T09:02:16Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Feature Space Particle Inference for Neural Network Ensembles [13.392254060510666]
Particle-based inference methods offer a promising approach from a Bayesian perspective.
We propose optimizing particles in the feature space where the activation of a specific intermediate layer lies.
Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness.
arXiv Detail & Related papers (2022-06-02T09:16:26Z) - Learning via nonlinear conjugate gradients and depth-varying neural ODEs [5.565364597145568]
The inverse problem of supervised reconstruction of depth-variable parameters in a neural ordinary differential equation (NODE) is considered.
The proposed parameter reconstruction is done for a general first order differential equation by minimizing a cost functional.
The sensitivity problem can estimate changes in the network output under perturbation of the trained parameters.
arXiv Detail & Related papers (2022-02-11T17:00:48Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Efficient training of physics-informed neural networks via importance
sampling [2.9005223064604078]
Physics-In Neural Networks (PINNs) are a class of deep neural networks that are trained to compute systems governed by partial differential equations (PDEs)
We show that an importance sampling approach will improve the convergence behavior of PINNs training.
arXiv Detail & Related papers (2021-04-26T02:45:10Z) - Fast Gravitational Approach for Rigid Point Set Registration with
Ordinary Differential Equations [79.71184760864507]
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA)
In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field.
We show that the new method class has characteristics not found in previous alignment methods.
arXiv Detail & Related papers (2020-09-28T15:05:39Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.