Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models
- URL: http://arxiv.org/abs/2502.01825v1
- Date: Mon, 03 Feb 2025 21:06:35 GMT
- Title: Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models
- Authors: Riddhi More, Jeremy S. Bradbury,
- Abstract summary: Data augmentation has become a standard practice in software engineering to address limited or imbalanced data sets.<n>We show how to test for bias and understand the impact that the inclusion of augmented samples in testing sets can have on model evaluation.
- Score: 1.9336815376402723
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data augmentation has become a standard practice in software engineering to address limited or imbalanced data sets, particularly in specialized domains like test classification and bug detection where data can be scarce. Although techniques such as SMOTE and mutation-based augmentation are widely used in software testing and debugging applications, a rigorous understanding of how augmented training data impacts model bias is lacking. It is especially critical to consider bias in scenarios where augmented data sets are used not just in training but also in testing models. Through a comprehensive case study of flaky test classification, we demonstrate how to test for bias and understand the impact that the inclusion of augmented samples in testing sets can have on model evaluation.
Related papers
- A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data [9.57464542357693]
This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.<n>We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.<n>After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
arXiv Detail & Related papers (2024-07-02T09:54:39Z) - The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes [30.30769701138665]
We introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data.
Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem.
We introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
arXiv Detail & Related papers (2024-02-14T03:43:05Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [117.72709110877939]
Test-time adaptation (TTA) has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.<n>We categorize TTA into several distinct groups based on the form of test data, namely, test-time domain adaptation, test-time batch adaptation, and online test-time adaptation.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - Certifying Data-Bias Robustness in Linear Regression [12.00314910031517]
We present a technique for certifying whether linear regression models are pointwise-robust to label bias in a training dataset.
We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method.
We also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets.
arXiv Detail & Related papers (2022-06-07T20:47:07Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - On the Role of Dataset Quality and Heterogeneity in Model Confidence [27.657631193015252]
Safety-critical applications require machine learning models that output accurate and calibrated probabilities.
Uncalibrated deep networks are known to make over-confident predictions.
We study the impact of dataset quality by studying the impact of dataset size and the label noise on the model confidence.
arXiv Detail & Related papers (2020-02-23T05:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.