Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
- URL: http://arxiv.org/abs/2506.20025v1
- Date: Tue, 24 Jun 2025 21:48:58 GMT
- Title: Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
- Authors: Nathan Stromberg, Christos Thrampoulidis, Lalitha Sankar,
- Abstract summary: This work explores the regime of last layer retraining (LLR) in which the unseen limited (retraining) data is frequently inseparable and the model proportionately sized.<n>We show, in theory and practice, that loss weighting is still effective in this regime.
- Score: 29.12578724826307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While machine learning models become more capable in discriminative tasks at scale, their ability to overcome biases introduced by training data has come under increasing scrutiny. Previous results suggest that there are two extremes of parameterization with very different behaviors: the population (underparameterized) setting where loss weighting is optimal and the separable overparameterized setting where loss weighting is ineffective at ensuring equal performance across classes. This work explores the regime of last layer retraining (LLR) in which the unseen limited (retraining) data is frequently inseparable and the model proportionately sized, falling between the two aforementioned extremes. We show, in theory and practice, that loss weighting is still effective in this regime, but that these weights \emph{must} take into account the relative overparameterization of the model.
Related papers
- Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning [66.8042627609456]
Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs)<n>In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance.<n>We propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance.
arXiv Detail & Related papers (2025-05-17T10:41:22Z) - Optimizing importance weighting in the presence of sub-population shifts [0.0]
A distribution shift between the training and test data can severely harm performance of machine learning models.
We argue that existing weightings for determining the weights are suboptimal, as they neglect the increase of the variance of the estimated model due to the finite sample size of the training data.
We propose a bi-level optimization procedure in which the weights and model parameters are optimized simultaneously.
arXiv Detail & Related papers (2024-10-18T09:21:10Z) - AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs [61.13296177652599]
We show that data mixtures that perform well at smaller scales may not retain their advantage at larger scales.<n>We propose AutoScale, a two-stage, scale-aware data composition framework.
arXiv Detail & Related papers (2024-07-29T17:06:30Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - FORML: Learning to Reweight Data for Fairness [2.105564340986074]
We introduce Fairness Optimized Reweighting via Meta-Learning (FORML)
FORML balances fairness constraints and accuracy by jointly optimizing training sample weights and a neural network's parameters.
We show that FORML improves equality of opportunity fairness criteria over existing state-of-the-art reweighting methods by approximately 1% on image classification tasks and by approximately 5% on a face prediction task.
arXiv Detail & Related papers (2022-02-03T17:36:07Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - You Only Need End-to-End Training for Long-Tailed Recognition [8.789819609485225]
Cross-entropy loss tends to produce highly correlated features on imbalanced data.
We propose two novel modules, Block-based Relatively Balanced Batch Sampler (B3RS) and Batch Embedded Training (BET)
Experimental results on the long-tailed classification benchmarks, CIFAR-LT and ImageNet-LT, demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-12-11T11:44:09Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Balanced Softmax Cross-Entropy for Incremental Learning [6.5423218639215275]
Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks.
Recent methods has proven to be effective to mitigate catastrophic forgetting.
We propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances.
arXiv Detail & Related papers (2021-03-23T13:30:26Z) - Multi-Loss Weighting with Coefficient of Variations [19.37721431024278]
We propose a weighting scheme based on the coefficient of variations and set the weights based on properties observed while training the model.
The proposed method incorporates a measure of uncertainty to balance the losses, and as a result the loss weights evolve during training without requiring another (learning based) optimisation.
The validity of the approach is shown empirically for depth estimation and semantic segmentation on multiple datasets.
arXiv Detail & Related papers (2020-09-03T14:51:19Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.