Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
- URL: http://arxiv.org/abs/2510.25480v1
- Date: Wed, 29 Oct 2025 13:04:17 GMT
- Title: Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
- Authors: Florian A. Hölzl, Daniel Rueckert, Georgios Kaissis,
- Abstract summary: We introduce Gradient-Weight Alignment (GWA), quantifying the coherence between per-sample gradients and model weights.<n>We show that effective learning corresponds to coherent alignment, while misalignment indicates deteriorating generalization.<n>Experiments show that GWA accurately predicts optimal early stopping, enables principled model comparisons, and identifies influential training samples.
- Score: 32.61771956544867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised classification setting, we investigate whether interactions between training data and model weights can yield such a metric that both tracks generalization during training and attributes performance to individual training samples. We introduce Gradient-Weight Alignment (GWA), quantifying the coherence between per-sample gradients and model weights. We show that effective learning corresponds to coherent alignment, while misalignment indicates deteriorating generalization. GWA is efficiently computable during training and reflects both sample-specific contributions and dataset-wide learning dynamics. Extensive experiments show that GWA accurately predicts optimal early stopping, enables principled model comparisons, and identifies influential training samples, providing a validation-set-free approach for model analysis directly from the training data.
Related papers
- Class Confidence Aware Reweighting for Long Tailed Learning [0.8297806372438926]
We present the design of a class and confidence-aware re-weighting scheme for long-tailed learning.<n>We use an (p_t, f_c) function to enable the modulation of the contribution towards the training task based upon the confidence value of the prediction.
arXiv Detail & Related papers (2026-01-22T12:58:05Z) - Supervised learning pays attention [42.97070083645048]
In-context learning with attention enables large neural networks to make context-specific predictions by selectively focusing on relevant examples.<n>We show how to flexibly fit personalized models for each prediction point and (2) model retain simplicity and interpretability.<n>Our method fits a local model for each test observation by weighting the training data according to attention, a supervised similarity measure.
arXiv Detail & Related papers (2025-12-10T18:43:46Z) - Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.<n>We introduce novel algorithms for dynamic, instance-level data reweighting.<n>Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Relating Regularization and Generalization through the Intrinsic
Dimension of Activations [11.00580615194563]
We show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models.
We also examine the LLID over the course of training of models that exhibit grokking.
arXiv Detail & Related papers (2022-11-23T19:00:00Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.