On Statistical Bias In Active Learning: How and When To Fix It
- URL: http://arxiv.org/abs/2101.11665v1
- Date: Wed, 27 Jan 2021 19:52:24 GMT
- Title: On Statistical Bias In Active Learning: How and When To Fix It
- Authors: Sebastian Farquhar, Yarin Gal, Tom Rainforth
- Abstract summary: Active learning is a powerful tool when labelling data is expensive.
It introduces a bias because the training data no longer follows the population distribution.
We formalize this bias and investigate the situations in which it can be harmful and sometimes even helpful.
- Score: 42.768124675364376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning is a powerful tool when labelling data is expensive, but it
introduces a bias because the training data no longer follows the population
distribution. We formalize this bias and investigate the situations in which it
can be harmful and sometimes even helpful. We further introduce novel
corrective weights to remove bias when doing so is beneficial. Through this,
our work not only provides a useful mechanism that can improve the active
learning approach, but also an explanation of the empirical successes of
various existing approaches which ignore this bias. In particular, we show that
this bias can be actively helpful when training overparameterized models --
like neural networks -- with relatively little data.
Related papers
- Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training.
Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - An Exploration of How Training Set Composition Bias in Machine Learning
Affects Identifying Rare Objects [0.0]
It is common to up-weight the examples of the rare class to ensure it isn't ignored.
It is also a frequent practice to train on restricted data where the balance of source types is closer to equal.
Here we show that these practices can bias the model toward over-assigning sources to the rare class.
arXiv Detail & Related papers (2022-07-07T10:26:55Z) - Unsupervised Learning of Unbiased Visual Representations [10.871587311621974]
Deep neural networks are known for their inability to learn robust representations when biases exist in the dataset.
We propose a fully unsupervised debiasing framework, consisting of three steps.
We employ state-of-the-art supervised debiasing techniques to obtain an unbiased model.
arXiv Detail & Related papers (2022-04-26T10:51:50Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.