Undersampling is a Minimax Optimal Robustness Intervention in
Nonparametric Classification
- URL: http://arxiv.org/abs/2205.13094v4
- Date: Mon, 19 Jun 2023 15:44:18 GMT
- Title: Undersampling is a Minimax Optimal Robustness Intervention in
Nonparametric Classification
- Authors: Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto
- Abstract summary: We show that learning is fundamentally constrained by a lack of minority group samples.
In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal.
- Score: 28.128464387420216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While a broad range of techniques have been proposed to tackle distribution
shift, the simple baseline of training on an $\textit{undersampled}$ balanced
dataset often achieves close to state-of-the-art-accuracy across several
popular benchmarks. This is rather surprising, since undersampling algorithms
discard excess majority group data. To understand this phenomenon, we ask if
learning is fundamentally constrained by a lack of minority group samples. We
prove that this is indeed the case in the setting of nonparametric binary
classification. Our results show that in the worst case, an algorithm cannot
outperform undersampling unless there is a high degree of overlap between the
train and test distributions (which is unlikely to be the case in real-world
datasets), or if the algorithm leverages additional structure about the
distribution shift. In particular, in the case of label shift we show that
there is always an undersampling algorithm that is minimax optimal. In the case
of group-covariate shift we show that there is an undersampling algorithm that
is minimax optimal when the overlap between the group distributions is small.
We also perform an experimental case study on a label shift dataset and find
that in line with our theory, the test accuracy of robust neural network
classifiers is constrained by the number of minority samples.
Related papers
- Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data [15.030895782548576]
The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning.
The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy.
However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances.
This paper proposes a multi-granularity relabeled under-sampling algorithm (MGRU) which fully considers the local information of the data set in the
arXiv Detail & Related papers (2022-01-11T14:07:55Z) - Does Adversarial Oversampling Help us? [10.210871872870737]
We propose a three-player adversarial game-based end-to-end method to handle class imbalance in datasets.
Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach.
The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class datasets.
arXiv Detail & Related papers (2021-08-20T05:43:17Z) - A multi-schematic classifier-independent oversampling approach for
imbalanced datasets [0.0]
It is evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers.
Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS.
ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm.
arXiv Detail & Related papers (2021-07-15T14:03:24Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Synthesising Multi-Modal Minority Samples for Tabular Data [3.7311680121118345]
Adding synthetic minority samples to the dataset before training is a popular technique to address this difficulty.
We propose a latent space framework which maps the multi-modal samples to a dense continuous latent space.
We show that our framework generates better synthetic data than the existing methods.
arXiv Detail & Related papers (2021-05-17T23:54:08Z) - Improved Algorithms for Agnostic Pool-based Active Classification [20.12178157010804]
We consider active learning for binary classification in the agnostic pool-based setting.
Our algorithm is superior to state of the art active learning algorithms on image classification datasets.
arXiv Detail & Related papers (2021-05-13T18:24:30Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.