Jitter: Random Jittering Loss Function
- URL: http://arxiv.org/abs/2106.13749v1
- Date: Fri, 25 Jun 2021 16:39:40 GMT
- Title: Jitter: Random Jittering Loss Function
- Authors: Zhicheng Cai, Chenglei Peng and Sidan Du
- Abstract summary: One novel regularization method called flooding makes the training loss fluctuate around the flooding level.
We propose a novel method called Jitter to improve it.
Jitter can be a domain-, task-, and model-independent regularization method and train the model effectively after the training error reduces to zero.
- Score: 2.716362160018477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regularization plays a vital role in machine learning optimization. One novel
regularization method called flooding makes the training loss fluctuate around
the flooding level. It intends to make the model continue to random walk until
it comes to a flat loss landscape to enhance generalization. However, the
hyper-parameter flooding level of the flooding method fails to be selected
properly and uniformly. We propose a novel method called Jitter to improve it.
Jitter is essentially a kind of random loss function. Before training, we
randomly sample the Jitter Point from a specific probability distribution. The
flooding level should be replaced by Jitter point to obtain a new target
function and train the model accordingly. As Jitter point acting as a random
factor, we actually add some randomness to the loss function, which is
consistent with the fact that there exists innumerable random behaviors in the
learning process of the machine learning model and is supposed to make the
model more robust. In addition, Jitter performs random walk randomly which
divides the loss curve into small intervals and then flipping them over,
ideally making the loss curve much flatter and enhancing generalization
ability. Moreover, Jitter can be a domain-, task-, and model-independent
regularization method and train the model effectively after the training error
reduces to zero. Our experimental results show that Jitter method can improve
model performance more significantly than the previous flooding method and make
the test loss curve descend twice.
Related papers
- Task-recency bias strikes back: Adapting covariances in Exemplar-Free Class Incremental Learning [0.3281128493853064]
We tackle the problem of training a model on a sequence of tasks without access to past data.
Existing methods represent classes as Gaussian distributions in the feature extractor's latent space.
We propose AdaGauss -- a novel method that adapts covariance matrices from task to task.
arXiv Detail & Related papers (2024-09-26T20:18:14Z) - Source-Free Test-Time Adaptation For Online Surface-Defect Detection [29.69030283193086]
We propose a novel test-time adaptation surface-defect detection approach.
It adapts pre-trained models to new domains and classes during inference.
Experiments demonstrate it outperforms state-of-the-art techniques.
arXiv Detail & Related papers (2024-08-18T14:24:05Z) - The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret [64.04721528586747]
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging.
In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret.
We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF.
arXiv Detail & Related papers (2024-06-22T06:43:51Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Bridging the Gap: Addressing Discrepancies in Diffusion Model Training
for Classifier-Free Guidance [1.6804613362826175]
Diffusion models have emerged as a pivotal advancement in generative models.
In this paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior.
We introduce an updated loss function that better aligns training objectives with sampling behaviors.
arXiv Detail & Related papers (2023-11-02T02:03:12Z) - Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
Pre-training has achieved remarkable success when transferred to downstream tasks.
This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks.
arXiv Detail & Related papers (2023-06-21T07:43:23Z) - Training Normalizing Flows with the Precision-Recall Divergence [73.92251251511199]
We show that achieving a specified precision-recall trade-off corresponds to minimising -divergences from a family we call the em PR-divergences
We propose a novel generative model that is able to train a normalizing flow to minimise any -divergence, and in particular, achieve a given precision-recall trade-off.
arXiv Detail & Related papers (2023-02-01T17:46:47Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - Machine Learning's Dropout Training is Distributionally Robust Optimal [10.937094979510212]
This paper shows that dropout training in Generalized Linear Models provides out-of-sample expected loss guarantees.
It also provides a novel, parallelizable, Unbiased Multi-Level Monte Carlo algorithm to speed-up the implementation of dropout training.
arXiv Detail & Related papers (2020-09-13T23:13:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.