Implicit regularization via soft ascent-descent
- URL: http://arxiv.org/abs/2310.10006v1
- Date: Mon, 16 Oct 2023 02:02:56 GMT
- Title: Implicit regularization via soft ascent-descent
- Authors: Matthew J. Holland and Kosuke Nakatani
- Abstract summary: We show how to achieve better off-sample generalization with minimal trial-and-error.
We propose a softened, pointwise mechanism called SoftAD that downweights points on the borderline, limits the effects of outliers, and retains the ascent-descent effect.
Our empirical tests range from simple binary classification on the plane to image classification using neural networks with millions of parameters.
- Score: 7.335712499936906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As models grow larger and more complex, achieving better off-sample
generalization with minimal trial-and-error is critical to the reliability and
economy of machine learning workflows. As a proxy for the well-studied
heuristic of seeking "flat" local minima, gradient regularization is a natural
avenue, and first-order approximations such as Flooding and sharpness-aware
minimization (SAM) have received significant attention, but their performance
depends critically on hyperparameters (flood threshold and neighborhood radius,
respectively) that are non-trivial to specify in advance. In order to develop a
procedure which is more resilient to misspecified hyperparameters, with the
hard-threshold "ascent-descent" switching device used in Flooding as
motivation, we propose a softened, pointwise mechanism called SoftAD that
downweights points on the borderline, limits the effects of outliers, and
retains the ascent-descent effect. We contrast formal stationarity guarantees
with those for Flooding, and empirically demonstrate how SoftAD can realize
classification accuracy competitive with SAM and Flooding while maintaining a
much smaller loss generalization gap and model norm. Our empirical tests range
from simple binary classification on the plane to image classification using
neural networks with millions of parameters; the key trends are observed across
all datasets and models studied, and suggest a potential new approach to
implicit regularization.
Related papers
- Adversarial Robustness Overestimation and Instability in TRADES [4.063518154926961]
TRADES sometimes yields disproportionately high PGD validation accuracy compared to the AutoAttack testing accuracy in the multiclass classification task.
This discrepancy highlights a significant overestimation of robustness for these instances, potentially linked to gradient masking.
arXiv Detail & Related papers (2024-10-10T07:32:40Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Threshold-Consistent Margin Loss for Open-World Deep Metric Learning [42.03620337000911]
Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures.
Inconsistency often complicates the threshold selection process when deploying commercial image retrieval systems.
We propose a novel variance-based metric called Operating-Point-Inconsistency-Score (OPIS) that quantifies the variance in the operating characteristics across classes.
arXiv Detail & Related papers (2023-07-08T21:16:41Z) - Semi-Supervised Deep Regression with Uncertainty Consistency and
Variational Model Ensembling via Bayesian Neural Networks [31.67508478764597]
We propose a novel approach to semi-supervised regression, namely Uncertainty-Consistent Variational Model Ensembling (UCVME)
Our consistency loss significantly improves uncertainty estimates and allows higher quality pseudo-labels to be assigned greater importance under heteroscedastic regression.
Experiments show that our method outperforms state-of-the-art alternatives on different tasks and can be competitive with supervised methods that use full labels.
arXiv Detail & Related papers (2023-02-15T10:40:51Z) - Test-Time Amendment with a Coarse Classifier for Fine-Grained
Classification [10.719054378755981]
We present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE)
HiE utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions.
Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes.
arXiv Detail & Related papers (2023-02-01T10:55:27Z) - Adaptive Dimension Reduction and Variational Inference for Transductive
Few-Shot Classification [2.922007656878633]
We propose a new clustering method based on Variational Bayesian inference, further improved by Adaptive Dimension Reduction.
Our proposed method significantly improves accuracy in the realistic unbalanced transductive setting on various Few-Shot benchmarks.
arXiv Detail & Related papers (2022-09-18T10:29:02Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.