Related papers: Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

URL: http://arxiv.org/abs/2602.14701v1
Date: Mon, 16 Feb 2026 12:40:59 GMT
Title: Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation
Authors: Killian Bakong, Laurent Massoulié, Edouard Oyallon, Kevin Scaman,
Abstract summary: We introduce methods to reduce the computational and memory costs of training deep neural networks.<n>Our approach consists in replacing exact vector-jacobian products by randomized, unbiased approximations thereof during backpropagation.
Score: 21.297933521065076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work we introduce methods to reduce the computational and memory costs of training deep neural networks. Our approach consists in replacing exact vector-jacobian products by randomized, unbiased approximations thereof during backpropagation. We provide a theoretical analysis of the trade-off between the number of epochs needed to achieve a target precision and the cost reduction for each epoch. We then identify specific unbiased estimates of vector-jacobian products for which we establish desirable optimality properties of minimal variance under sparsity constraints. Finally we provide in-depth experiments on multi-layer perceptrons, BagNets and Visual Transfomers architectures. These validate our theoretical results, and confirm the potential of our proposed unbiased randomized backpropagation approach for reducing the cost of deep learning.

Related papers

On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization [57.179679246370114]
A potential limitation of existing methods is the bias inherent in most perturbation estimators unless a stepsize is proposed.<n>We propose a novel family of unbiased gradient scaling estimators that eliminate bias while maintaining favorable construction.
arXiv Detail & Related papers (2025-10-22T18:25:43Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective [16.610925506252716]
Minimax optimal estimation error rate $Theta(d/n)$ in classical estimation theory requires that the number of samples $n$ scales linearly with the dimensionality of the feature space $d$.<n>High dimensionality of the feature space and the high cost of collecting human-annotated data challenge the efficiency of traditional estimation methods.<n>We show that under the sparse random utility model, where the parameter of the reward function is $k$-sparse, the minimax optimal rate can be reduced to $Theta(k/n log(d/k))
arXiv Detail & Related papers (2025-01-30T11:41:13Z)
Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators.<n>We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
Verification of Geometric Robustness of Neural Networks via Piecewise Linear Approximation and Lipschitz Optimisation [57.10353686244835]
We address the problem of verifying neural networks against geometric transformations of the input image, including rotation, scaling, shearing, and translation. The proposed method computes provably sound piecewise linear constraints for the pixel values by using sampling and linear approximations in combination with branch-and-bound Lipschitz. We show that our proposed implementation resolves up to 32% more verification cases than present approaches.
arXiv Detail & Related papers (2024-08-23T15:02:09Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Contextual Linear Optimization with Partial Feedback [35.38485630117593]
We propose a class of offline learning algorithms for Contextual linear optimization (CLO) with different types of feedback.<n>We provide a novel fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of estimation methods.
arXiv Detail & Related papers (2024-05-26T13:27:27Z)
Adaptive importance sampling for heavy-tailed distributions via $\alpha$-divergence minimization [2.879807093604632]
We propose an AIS algorithm that approximates the target by Student-t proposal distributions. We adapt location and scale parameters by matching the escort moments of the target and the proposal. These updates minimize the $alpha$-divergence between the target and the proposal, thereby connecting with variational inference.
arXiv Detail & Related papers (2023-10-25T14:07:08Z)
Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP) Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN. It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z)
Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation in Contaminated Gaussian Models [1.609950046042424]
Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model. First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator. We develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization. The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used.
arXiv Detail & Related papers (2021-12-24T02:46:51Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Bayesian Optimization Meets Laplace Approximation for Robotic Introspection [41.117361086267806]
We introduce a scalable Laplace Approximation (LA) technique to make Deep Neural Networks (DNNs) more introspective. In particular, we propose a novel Bayesian Optimization (BO) algorithm to mitigate their tendency of under-fitting the true weight posterior. We show that the proposed framework can be scaled up to large datasets and architectures.
arXiv Detail & Related papers (2020-10-30T09:28:10Z)
Optimal Bayesian experimental design for subsurface flow problems [77.34726150561087]
We propose a novel approach for development of chaos expansion (PCE) surrogate model for the design utility function. This novel technique enables the derivation of a reasonable quality response surface for the targeted objective function with a computational budget comparable to several single-point evaluations.
arXiv Detail & Related papers (2020-08-10T09:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.