Related papers: Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

URL: http://arxiv.org/abs/2206.02659v6
Date: Fri, 22 Dec 2023 20:36:36 GMT
Title: Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
Authors: Haotian Ju, Dongyue Li, Hongyang R. Zhang
Abstract summary: We study the generalization properties of fine-tuning to understand the problem of overfitting. We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model.
Score: 20.2407347618552
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., the pretrained network) of the fine-tuned model and noise stability properties of deep networks. This paper identifies a Hessian-based distance measure through PAC-Bayesian analysis, which is shown to correlate well with observed generalization gaps of fine-tuned models. Theoretically, we prove Hessian distance-based generalization bounds for fine-tuned models. We also describe an extended study of fine-tuning against label noise, where overfitting remains a critical problem. We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model. Empirically, we observe that the Hessian-based distance measure can match the scale of the observed generalization gap of fine-tuned models in practice. We also test our algorithm on several image classification tasks with noisy training labels, showing gains over prior methods and decreases in the Hessian distance measure of the fine-tuned model.

Related papers

Flatness After All? [6.698677477097004]
We argue that generalization could be assessed by measuring flatness using a soft rank measure of the Hessian.<n>For non-calibrated models, we connect our flatness measure to the well-known Takeuchi Information Criterion and show that it still provides reliable estimates of generalization gaps for models that are not overly confident.
arXiv Detail & Related papers (2025-06-21T20:33:36Z)
Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations. diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples. We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z)
Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores. We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z)
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks [19.185059111021854]
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent. We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy.
arXiv Detail & Related papers (2024-10-29T14:28:49Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Generalization error of spectral algorithms [17.93452027304691]
We consider the training of kernels with a family of $textitspectral algorithms$ specified by profile $h(lambda)$. We derive the generalization error as a functional of learning profile $h(lambda)$ for two data models.
arXiv Detail & Related papers (2024-03-18T11:52:33Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach [18.009376840944284]
We present an algorithm that can effectively regularize the Hessian loss matrices leading to regions with bound loss surfaces. Our approach is effective for improving generalization in pretraining CLIP and chain-of-thought fine-tuning datasets.
arXiv Detail & Related papers (2023-06-14T14:58:36Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z)
Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP) MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update. We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z)
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features. We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z)
Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights. We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.