Related papers: Multiple importance sampling for stochastic gradient estimation

Multiple importance sampling for stochastic gradient estimation

URL: http://arxiv.org/abs/2407.15525v1
Date: Mon, 22 Jul 2024 10:28:56 GMT
Title: Multiple importance sampling for stochastic gradient estimation
Authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh,
Abstract summary: We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric.
Score: 33.42221341526944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation from single and multiple probability distributions. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric. Our framework combines multiple, diverse sampling distributions, each tailored to specific parameter gradients. This approach facilitates the importance sampling of vector-valued gradient estimation. Rather than naively combining multiple distributions, our framework involves optimally weighting data contribution across multiple distributions. This adapted combination of multiple importance yields superior gradient estimates, leading to faster training convergence. We demonstrate the effectiveness of our approach through empirical evaluations across a range of optimization tasks like classification and regression on both image and point cloud datasets.

Related papers

Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling [34.50693643119071]
adaptive or importance sampling reduces noise in gradient estimation. We present an algorithm that can incorporate existing importance functions into our framework. We observe improved convergence in classification and regression tasks with minimal computational overhead.
arXiv Detail & Related papers (2023-11-24T13:21:35Z)
Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo) MoSo aims to identify and remove the least informative samples from the training set. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z)
Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models [28.011868604717726]
We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions. We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
arXiv Detail & Related papers (2022-09-11T13:32:39Z)
Learning Structured Gaussians to Approximate Deep Ensembles [10.055143995729415]
This paper proposes using a sparse-structured multivariate Gaussian to provide a closed-form approxorimator for dense image prediction tasks. We capture the uncertainty and structured correlations in the predictions explicitly in a formal distribution, rather than implicitly through sampling alone. We demonstrate the merits of our approach on monocular depth estimation and show that the advantages of our approach are obtained with comparable quantitative performance.
arXiv Detail & Related papers (2022-03-29T12:34:43Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z)
Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference. Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z)
Simpler Certified Radius Maximization by Propagating Covariances [39.851641822878996]
We show an algorithm for maximizing the certified radius on datasets including Cifar-10, ImageNet, and Places365. We show how satisfying these criteria yields an algorithm for maximizing the certified radius on datasets with moderate depth, with a small compromise in overall accuracy.
arXiv Detail & Related papers (2021-04-13T01:38:36Z)
Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances. We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD. MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity [26.518803984578867]
Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging. One typically resorts to sampling-based approximations of the true marginal. We propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
arXiv Detail & Related papers (2020-07-03T19:36:35Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.