Related papers: Input-gradient space particle inference for neural network ensembles

Input-gradient space particle inference for neural network ensembles

URL: http://arxiv.org/abs/2306.02775v3
Date: Tue, 5 Mar 2024 16:44:43 GMT
Title: Input-gradient space particle inference for neural network ensembles
Authors: Trung Trinh, Markus Heinonen, Luigi Acerbi, Samuel Kaski
Abstract summary: First-order Repulsive Deep Ensemble (FoRDE) is an ensemble learning method based on ParVI. Experiments on image classification datasets and transfer learning tasks show that FoRDE significantly outperforms the gold-standard DEs.
Score: 32.64178604645513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Ensembles (DEs) demonstrate improved accuracy, calibration and robustness to perturbations over single neural networks partly due to their functional diversity. Particle-based variational inference (ParVI) methods enhance diversity by formalizing a repulsion term based on a network similarity kernel. However, weight-space repulsion is inefficient due to over-parameterization, while direct function-space repulsion has been found to produce little improvement over DEs. To sidestep these difficulties, we propose First-order Repulsive Deep Ensemble (FoRDE), an ensemble learning method based on ParVI, which performs repulsion in the space of first-order input gradients. As input gradients uniquely characterize a function up to translation and are much smaller in dimension than the weights, this method guarantees that ensemble members are functionally different. Intuitively, diversifying the input gradients encourages each network to learn different features, which is expected to improve the robustness of an ensemble. Experiments on image classification datasets and transfer learning tasks show that FoRDE significantly outperforms the gold-standard DEs and other ensemble methods in accuracy and calibration under covariate shift due to input perturbations.

Related papers

Semi-Implicit Variational Inference via Kernelized Path Gradient Descent [12.300415631357406]
Training with the Kullback-Leibler divergence can be challenging due to high variance and bias in high-dimensional settings.<n>We propose a kernelized KL divergence estimator that stabilizes training through nonparametric smoothing.<n>Our method's bias in function space is benign, leading to more stable and efficient optimization.
arXiv Detail & Related papers (2025-06-05T14:34:37Z)
Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective [61.5509267439999]
We present a unified theoretical framework for methods like GBP, RectGrad, LRP, and DTD. We demonstrate that they achieve input alignment by combining the weights of activated neurons. This alignment improves the visualization quality and reduces sensitivity to weight randomization.
arXiv Detail & Related papers (2025-03-14T07:58:26Z)
Diffusion-PINN Sampler [6.656265182236135]
We introduce a novel diffusion-based sampling algorithm that estimates the drift term by solving the governing partial differential equation of the log-density of the underlying SDE marginals via physics-informed neural networks (PINN) We prove that the error of log-density approximation can be controlled by the PINN residual loss, enabling us to establish convergence guarantees of DPS.
arXiv Detail & Related papers (2024-10-20T09:02:16Z)
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Feature Space Particle Inference for Neural Network Ensembles [13.392254060510666]
Particle-based inference methods offer a promising approach from a Bayesian perspective. We propose optimizing particles in the feature space where the activation of a specific intermediate layer lies. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness.
arXiv Detail & Related papers (2022-06-02T09:16:26Z)
Learning via nonlinear conjugate gradients and depth-varying neural ODEs [5.565364597145568]
The inverse problem of supervised reconstruction of depth-variable parameters in a neural ordinary differential equation (NODE) is considered. The proposed parameter reconstruction is done for a general first order differential equation by minimizing a cost functional. The sensitivity problem can estimate changes in the network output under perturbation of the trained parameters.
arXiv Detail & Related papers (2022-02-11T17:00:48Z)
Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint. It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z)
Efficient training of physics-informed neural networks via importance sampling [2.9005223064604078]
Physics-In Neural Networks (PINNs) are a class of deep neural networks that are trained to compute systems governed by partial differential equations (PDEs) We show that an importance sampling approach will improve the convergence behavior of PINNs training.
arXiv Detail & Related papers (2021-04-26T02:45:10Z)
Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations [79.71184760864507]
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA) In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field. We show that the new method class has characteristics not found in previous alignment methods.
arXiv Detail & Related papers (2020-09-28T15:05:39Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.