Related papers: Exploring Variance Reduction in Importance Sampling for Efficient DNN Training

Exploring Variance Reduction in Importance Sampling for Efficient DNN Training

URL: http://arxiv.org/abs/2501.13296v1
Date: Thu, 23 Jan 2025 00:43:34 GMT
Title: Exploring Variance Reduction in Importance Sampling for Efficient DNN Training
Authors: Takuro Kutsuna,
Abstract summary: This paper proposes a method for estimating variance reduction during deep neural network (DNN) training using only minibatches sampled under importance sampling.<n>An absolute metric to quantify the efficiency of importance sampling is also introduced as well as an algorithm for real-time estimation of importance scores based on moving gradient statistics.
Score: 1.7767466724342067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Importance sampling is widely used to improve the efficiency of deep neural network (DNN) training by reducing the variance of gradient estimators. However, efficiently assessing the variance reduction relative to uniform sampling remains challenging due to computational overhead. This paper proposes a method for estimating variance reduction during DNN training using only minibatches sampled under importance sampling. By leveraging the proposed method, the paper also proposes an effective minibatch size to enable automatic learning rate adjustment. An absolute metric to quantify the efficiency of importance sampling is also introduced as well as an algorithm for real-time estimation of importance scores based on moving gradient statistics. Theoretical analysis and experiments on benchmark datasets demonstrated that the proposed algorithm consistently reduces variance, improves training efficiency, and enhances model accuracy compared with current importance-sampling approaches while maintaining minimal computational overhead.

Related papers

When Machine Learning Meets Importance Sampling: A More Efficient Rare Event Estimation Approach [29.286353206449643]
We explore the simulation task of estimating rare event probabilities for tandem queues in their steady state. Existing literature has recognized that importance sampling methods can be inefficient, due to the exploding variance of the path-dependent likelihood functions. We introduce a new importance sampling approach that utilizes a marginal likelihood ratio on the stationary distribution, effectively avoiding the issue of excessive variance.
arXiv Detail & Related papers (2025-04-18T07:25:56Z)
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning [61.403275660120606]
Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives. We propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method. Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between computational efficiency and performance.
arXiv Detail & Related papers (2025-03-02T13:43:53Z)
STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments [22.32661807469984]
We develop a novel framework that integrates the Student's t-distribution with machine learning tools to fit heavy-tailed metrics. By adopting a variational EM method to optimize the loglikehood function, we can infer a robust solution that greatly eliminates the negative impact of outliers. Both simulations on synthetic data and long-term empirical results on Meituan experiment platform demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-23T09:35:59Z)
Efficient Backpropagation with Variance-Controlled Adaptive Sampling [32.297478086982466]
Sampling-based algorithms, which eliminate ''unimportant'' computations during forward and/or back propagation (BP), offer potential solutions to accelerate neural network training. We introduce a variance-controlled adaptive sampling (VCAS) method designed to accelerate BP. VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73.87% FLOPs reduction of BP and 49.58% FLOPs reduction of the whole training process.
arXiv Detail & Related papers (2024-02-27T05:40:36Z)
Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling [34.50693643119071]
adaptive or importance sampling reduces noise in gradient estimation. We present an algorithm that can incorporate existing importance functions into our framework. We observe improved convergence in classification and regression tasks with minimal computational overhead.
arXiv Detail & Related papers (2023-11-24T13:21:35Z)
Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation. They still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs) These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
A Dynamic Sampling Adaptive-SGD Method for Machine Learning [8.173034693197351]
We propose a method that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such directions. The proposed method exploits local curvature information and ensures that search directions are descent directions with high probability. Numerical experiments show that this method is able to choose the best learning rates and compares favorably to fine-tuned SGD for training logistic regression and DNNs.
arXiv Detail & Related papers (2019-12-31T15:36:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.