Towards Mitigating Spurious Correlations in the Wild: A Benchmark and a
more Realistic Dataset
- URL: http://arxiv.org/abs/2306.11957v2
- Date: Fri, 29 Sep 2023 06:09:08 GMT
- Title: Towards Mitigating Spurious Correlations in the Wild: A Benchmark and a
more Realistic Dataset
- Authors: Siddharth Joshi, Yu Yang, Yihao Xue, Wenhan Yang and Baharan
Mirzasoleiman
- Abstract summary: Deep neural networks often exploit non-predictive features that are spuriously correlated with class labels.
Despite the growing body of recent works on remedying spurious correlations, the lack of a standardized benchmark hinders reproducible evaluation.
We present SpuCo, a python package with modular implementations of state-of-the-art solutions enabling easy and reproducible evaluation.
- Score: 46.82577457368719
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural networks often exploit non-predictive features that are
spuriously correlated with class labels, leading to poor performance on groups
of examples without such features. Despite the growing body of recent works on
remedying spurious correlations, the lack of a standardized benchmark hinders
reproducible evaluation and comparison of the proposed solutions. To address
this, we present SpuCo, a python package with modular implementations of
state-of-the-art solutions enabling easy and reproducible evaluation of current
methods. Using SpuCo, we demonstrate the limitations of existing datasets and
evaluation schemes in validating the learning of predictive features over
spurious ones. To overcome these limitations, we propose two new vision
datasets: (1) SpuCoMNIST, a synthetic dataset that enables simulating the
effect of real world data properties e.g. difficulty of learning spurious
feature, as well as noise in the labels and features; (2) SpuCoAnimals, a
large-scale dataset curated from ImageNet that captures spurious correlations
in the wild much more closely than existing datasets. These contributions
highlight the shortcomings of current methods and provide a direction for
future research in tackling spurious correlations. SpuCo, containing the
benchmark and datasets, can be found at https://github.com/BigML-CS-UCLA/SpuCo,
with detailed documentation available at
https://spuco.readthedocs.io/en/latest/.
Related papers
- Autoencoder based approach for the mitigation of spurious correlations [2.7624021966289605]
Spurious correlations refer to erroneous associations in data that do not reflect true underlying relationships.
These correlations can lead deep neural networks (DNNs) to learn patterns that are not robust across diverse datasets or real-world scenarios.
We propose an autoencoder-based approach to analyze the nature of spurious correlations that exist in the Global Wheat Head Detection (GWHD) 2021 dataset.
arXiv Detail & Related papers (2024-06-27T05:28:44Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - RIFLE: Imputation and Robust Inference from Low Order Marginals [10.082738539201804]
We develop a statistical inference framework for regression and classification in the presence of missing data without imputation.
Our framework, RIFLE, estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model.
Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small.
arXiv Detail & Related papers (2021-09-01T23:17:30Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - The Surprising Performance of Simple Baselines for Misinformation
Detection [4.060731229044571]
We examine the performance of a broad set of modern transformer-based language models.
We present our framework as a baseline for creating and evaluating new methods for misinformation detection.
arXiv Detail & Related papers (2021-04-14T16:25:22Z) - Revisiting Data Complexity Metrics Based on Morphology for Overlap and
Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular
Problems Prospect [9.666866159867444]
This research work focuses on revisiting complexity metrics based on data morphology.
Being based on ball coverage by classes, they are named after Overlap Number of Balls.
arXiv Detail & Related papers (2020-07-15T18:21:13Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.