Robust Fine-Tuning of Deep Neural Networks with Hessian-based
Generalization Guarantees
- URL: http://arxiv.org/abs/2206.02659v6
- Date: Fri, 22 Dec 2023 20:36:36 GMT
- Title: Robust Fine-Tuning of Deep Neural Networks with Hessian-based
Generalization Guarantees
- Authors: Haotian Ju, Dongyue Li, Hongyang R. Zhang
- Abstract summary: We study the generalization properties of fine-tuning to understand the problem of overfitting.
We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model.
- Score: 20.2407347618552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider fine-tuning a pretrained deep neural network on a target task. We
study the generalization properties of fine-tuning to understand the problem of
overfitting, which has often been observed (e.g., when the target dataset is
small or when the training labels are noisy). Existing generalization measures
for deep networks depend on notions such as distance from the initialization
(i.e., the pretrained network) of the fine-tuned model and noise stability
properties of deep networks. This paper identifies a Hessian-based distance
measure through PAC-Bayesian analysis, which is shown to correlate well with
observed generalization gaps of fine-tuned models. Theoretically, we prove
Hessian distance-based generalization bounds for fine-tuned models. We also
describe an extended study of fine-tuning against label noise, where
overfitting remains a critical problem. We present an algorithm and a
generalization error guarantee for this algorithm under a class conditional
independent noise model. Empirically, we observe that the Hessian-based
distance measure can match the scale of the observed generalization gap of
fine-tuned models in practice. We also test our algorithm on several image
classification tasks with noisy training labels, showing gains over prior
methods and decreases in the Hessian distance measure of the fine-tuned model.
Related papers
- Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores.
We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z) - Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks [19.185059111021854]
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent.
We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy.
arXiv Detail & Related papers (2024-10-29T14:28:49Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Generalization error of spectral algorithms [17.93452027304691]
We consider the training of kernels with a family of $textitspectral algorithms$ specified by profile $h(lambda)$.
We derive the generalization error as a functional of learning profile $h(lambda)$ for two data models.
arXiv Detail & Related papers (2024-03-18T11:52:33Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach [18.009376840944284]
We present an algorithm that can effectively regularize the Hessian loss matrices leading to regions with bound loss surfaces.
Our approach is effective for improving generalization in pretraining CLIP and chain-of-thought fine-tuning datasets.
arXiv Detail & Related papers (2023-06-14T14:58:36Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data.
Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z) - Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.