Is Importance Weighting Incompatible with Interpolating Classifiers?
- URL: http://arxiv.org/abs/2112.12986v1
- Date: Fri, 24 Dec 2021 08:06:57 GMT
- Title: Is Importance Weighting Incompatible with Interpolating Classifiers?
- Authors: Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori
Hashimoto
- Abstract summary: We show that importance weighting fails because of exponentially-tailed losses like the logistic or cross-entropy loss.
As a remedy, we show that weightingly-tailed losses restore the effects of importance reweighting.
Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance.
- Score: 13.449501940517699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Importance weighting is a classic technique to handle distribution shifts.
However, prior work has presented strong empirical and theoretical evidence
demonstrating that importance weights can have little to no effect on
overparameterized neural networks. Is importance weighting truly incompatible
with the training of overparameterized neural networks? Our paper answers this
in the negative. We show that importance weighting fails not because of the
overparameterization, but instead, as a result of using exponentially-tailed
losses like the logistic or cross-entropy loss. As a remedy, we show that
polynomially-tailed losses restore the effects of importance reweighting in
correcting distribution shift in overparameterized models. We characterize the
behavior of gradient descent on importance weighted polynomially-tailed losses
with overparameterized linear models, and theoretically demonstrate the
advantage of using polynomially-tailed losses in a label shift setting.
Surprisingly, our theory shows that using weights that are obtained by
exponentiating the classical unbiased importance weights can improve
performance. Finally, we demonstrate the practical value of our analysis with
neural network experiments on a subpopulation shift and a label shift dataset.
When reweighted, our loss function can outperform reweighted cross-entropy by
as much as 9% in test accuracy. Our loss function also gives test accuracies
comparable to, or even exceeding, well-tuned state-of-the-art methods for
correcting distribution shifts.
Related papers
- Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining [29.12578724826307]
This work explores the regime of last layer retraining (LLR) in which the unseen limited (retraining) data is frequently inseparable and the model proportionately sized.<n>We show, in theory and practice, that loss weighting is still effective in this regime.
arXiv Detail & Related papers (2025-06-24T21:48:58Z) - Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning [66.8042627609456]
Loss reweighting has shown significant benefits for machine unlearning with large language models (LLMs)<n>In this paper, we identify two distinct goals of loss reweighting, namely, Saturation and Importance.<n>We propose SatImp, a simple reweighting method that combines the advantages of both saturation and importance.
arXiv Detail & Related papers (2025-05-17T10:41:22Z) - Understand the Effect of Importance Weighting in Deep Learning on Dataset Shift [0.0]
We evaluate the effectiveness of importance weighting in deep neural networks under label shift and covariate shift.<n>We observe that weighting strongly affects decision boundaries early in training but fades with prolonged optimization.<n>Our results call into question the practical utility of importance weighting for real-world distribution shifts.
arXiv Detail & Related papers (2025-05-06T15:16:38Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Imbalanced Nodes Classification for Graph Neural Networks Based on
Valuable Sample Mining [9.156427521259195]
A new loss function FD-Loss is reconstructed based on the traditional algorithm-level approach to the imbalance problem.
Our loss function can effectively solve the sample node imbalance problem and improve the classification accuracy by 4% compared to existing methods in the node classification task.
arXiv Detail & Related papers (2022-09-18T09:22:32Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network [8.79431718760617]
Training with mini-batch SGD and weight decay induces a bias toward rank minimization in weight matrices.
We show that this bias becomes more pronounced with smaller batch sizes, higher learning rates, or stronger weight decay.
We empirically explore the connection between this bias and generalization, finding that it has a marginal effect on the test performance.
arXiv Detail & Related papers (2022-06-12T17:06:35Z) - Understanding Square Loss in Training Overparametrized Neural Network
Classifiers [31.319145959402462]
We contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks.
We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error.
The resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness.
arXiv Detail & Related papers (2021-12-07T12:12:30Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z) - The Golden Ratio of Learning and Momentum [0.5076419064097732]
This paper proposes a new information-theoretical loss function motivated by neural signal processing in a synapse.
All results taken together show that loss, learning rate, and momentum are closely connected.
arXiv Detail & Related papers (2020-06-08T17:08:13Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.