On how to avoid exacerbating spurious correlations when models are
overparameterized
- URL: http://arxiv.org/abs/2206.12739v1
- Date: Sat, 25 Jun 2022 21:53:44 GMT
- Title: On how to avoid exacerbating spurious correlations when models are
overparameterized
- Authors: Tina Behnia, Ke Wang, Christos Thrampoulidis
- Abstract summary: We show that VS-loss learns a model that is fair towards minorities even when spurious features are strong.
Compared to previous works, our bounds hold for more general models, they are non-asymptotic, and, they apply even at scenarios of extreme imbalance.
- Score: 33.315813572333745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparameterized models fail to generalize well in the presence of data
imbalance even when combined with traditional techniques for mitigating
imbalances. This paper focuses on imbalanced classification datasets, in which
a small subset of the population -- a minority -- may contain features that
correlate spuriously with the class label. For a parametric family of
cross-entropy loss modifications and a representative Gaussian mixture model,
we derive non-asymptotic generalization bounds on the worst-group error that
shed light on the role of different hyper-parameters. Specifically, we prove
that, when appropriately tuned, the recently proposed VS-loss learns a model
that is fair towards minorities even when spurious features are strong. On the
other hand, alternative heuristics, such as the weighted CE and the LA-loss,
can fail dramatically. Compared to previous works, our bounds hold for more
general models, they are non-asymptotic, and, they apply even at scenarios of
extreme imbalance.
Related papers
- Aliasing and Label-Independent Decomposition of Risk: Beyond the bias-variance trade-off [0.0]
A central problem in data science is to use potentially noisy samples to predict function values for unseen inputs.
We introduce an alternative paradigm called the generalized aliasing decomposition (GAD)
GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels.
arXiv Detail & Related papers (2024-08-15T17:49:24Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - On the Implicit Geometry of Cross-Entropy Parameterizations for
Label-Imbalanced Data [26.310275682709776]
Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE large models on labelimbalanced data.
We show that logit-adjusted parameterizations can be appropriately tuned to learn to learn irrespective of the minority imbalance ratio.
arXiv Detail & Related papers (2023-03-14T03:04:37Z) - The Unbearable Weight of Massive Privilege: Revisiting Bias-Variance
Trade-Offs in the Context of Fair Prediction [7.975779552420981]
We propose a conditional-iid (ciid) model that seeks to improve on the trade-offs made by a single model.
We empirically test our setup on the COMPAS and folktables datasets.
Our analysis suggests that there might be principled procedures and concrete real-world use cases under which conditional models are preferred.
arXiv Detail & Related papers (2023-02-17T05:34:35Z) - Bias-inducing geometries: an exactly solvable data model with fairness
implications [13.690313475721094]
We introduce an exactly solvable high-dimensional model of data imbalance.
We analytically unpack the typical properties of learning models trained in this synthetic framework.
We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.