Related papers: On gradient descent training under data augmentation with on-line noisy copies

On gradient descent training under data augmentation with on-line noisy copies

URL: http://arxiv.org/abs/2206.03734v1
Date: Wed, 8 Jun 2022 08:20:00 GMT
Title: On gradient descent training under data augmentation with on-line noisy copies
Authors: Katsuyuki Hagiwara
Abstract summary: We consider descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We show that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regression training. We experimentally investigated the training process of neural networks under DA with off-line noisy copies.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regression training whose regularization parameter corresponds to the variance of injected noise. On the other hand, we showed that the learning rate is multiplied by the number of noisy copies plus one in full-batch under the sum of squared errors and the mini-batch under the mean squared error; i.e., DA with on-line copies yields apparent acceleration of training. The apparent acceleration and regularization effect come from the original part and noise in a copy data respectively. These results are confirmed in a numerical experiment. In the numerical experiment, we found that our result can be approximately applied to usual off-line DA in under-parameterization scenario and can not in over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression is possible to be applied to neural networks.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data. generating high-quality outputs requires many discretization steps. We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z)
On the Relation Between Linear Diffusion and Power Iteration [42.158089783398616]
We study the generation process as a correlation machine'' We show that low frequencies emerge earlier in the generation process, where the denoising basis vectors are more aligned to the true data with a rate depending on their eigenvalues. This model allows us to show that the linear diffusion model converges in mean to the leading eigenvector of the underlying data, similarly to the prevalent power iteration method.
arXiv Detail & Related papers (2024-10-16T07:33:12Z)
Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection [9.784793380119806]
We introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset.
arXiv Detail & Related papers (2024-07-04T14:28:52Z)
Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
Solving Inverse Problems with Score-Based Generative Priors learned from Noisy Data [1.7969777786551424]
SURE-Score is an approach for learning score-based generative models using training samples corrupted by additive Gaussian noise. We demonstrate the generality of SURE-Score by learning priors and applying posterior sampling to ill-posed inverse problems in two practical applications.
arXiv Detail & Related papers (2023-05-02T02:51:01Z)
On Calibrating Diffusion Probabilistic Models [78.75538484265292]
diffusion probabilistic models (DPMs) have achieved promising results in diverse generative tasks. We propose a simple way for calibrating an arbitrary pretrained DPM, with which the score matching loss can be reduced and the lower bounds of model likelihood can be increased. Our calibration method is performed only once and the resulting models can be used repeatedly for sampling.
arXiv Detail & Related papers (2023-02-21T14:14:40Z)
Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time. We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z)
Subset-of-Data Variational Inference for Deep Gaussian-Processes Regression [0.0]
Deep Gaussian Processes (DGPs) are multi-layer, flexible extensions of Gaussian processes. Sparse approximations simplify the training but often require optimization over a large number of inducing inputs and their locations. In this paper, we simplify the training by setting the locations to a fixed subset of data and sampling the inducing inputs from a variational distribution.
arXiv Detail & Related papers (2021-07-17T15:55:35Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.