Feature-Weighted Maximum Representative Subsampling
- URL: http://arxiv.org/abs/2603.01013v1
- Date: Sun, 01 Mar 2026 09:47:14 GMT
- Title: Feature-Weighted Maximum Representative Subsampling
- Authors: Tony Hauptmann, Stefan Kramer,
- Abstract summary: We develop an algorithm that uses feature weights to minimize the impact of highly biased features on the computation of sample weights.<n>Our algorithm is based on Maximum Representative Subsampling (MRS), which debiases datasets by aligning a non-representative sample with a representative one.<n>The new algorithm, named feature-weighted MRS, decreases the emphasis on highly biased features, allowing it to retain more instances for downstream tasks.
- Score: 1.5960546024967324
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In the social sciences, it is often necessary to debias studies and surveys before valid conclusions can be drawn. Debiasing algorithms enable the computational removal of bias using sample weights. However, an issue arises when only a subset of features is highly biased, while the rest is already representative. Algorithms need to strongly alter the sample distribution to manage a few highly biased features, which can in turn introduce bias into already representative variables. To address this issue, we developed a method that uses feature weights to minimize the impact of highly biased features on the computation of sample weights. Our algorithm is based on Maximum Representative Subsampling (MRS), which debiases datasets by aligning a non-representative sample with a representative one through iterative removal of elements to create a representative subsample. The new algorithm, named feature-weighted MRS (FW-MRS), decreases the emphasis on highly biased features, allowing it to retain more instances for downstream tasks. The feature weights are derived from the feature importance of a domain classifier trained to differentiate between the representative and non-representative datasets. We validated FW-MRS using eight tabular datasets, each of which we artificially biased. Biased features can be important for downstream tasks, and focusing less on them could lead to a decline in generalization. For this reason, we assessed the generalization performance of FW-MRS on downstream tasks and found no statistically significant differences. Additionally, FW-MRS was applied to a real-world dataset from the social sciences. The source code is available at https://github.com/kramerlab/FeatureWeightDebiasing.
Related papers
- Sebra: Debiasing Through Self-Guided Bias Ranking [54.09529903433859]
Ranking samples by fine-grained estimates of spuriosity has recently been shown to significantly benefit bias mitigation.<n>We propose a debiasing framework based on our novel ulSelf-Guided ulBias ulRanking (emphSebra)<n>Sebra mitigates biases via an automatic ranking of data points by spuriosity within their respective classes.
arXiv Detail & Related papers (2025-01-30T11:31:38Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Marginal Debiased Network for Fair Visual Recognition [59.05212866862219]
We propose a novel marginal debiased network (MDN) to learn debiased representations.
Our MDN can achieve a remarkable performance on under-represented samples.
arXiv Detail & Related papers (2024-01-04T08:57:09Z) - IBADR: an Iterative Bias-Aware Dataset Refinement Framework for
Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework.
We first train a shallow model to quantify the bias degree of samples in the pool.
Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator.
In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z) - General Debiasing for Multimodal Sentiment Analysis [47.05329012210878]
We propose a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models.
We employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction.
The empirical results demonstrate the superior generalization ability of our proposed framework.
arXiv Detail & Related papers (2023-07-20T00:36:41Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Fast and Accurate Importance Weighting for Correcting Sample Bias [4.750521042508541]
We propose a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights.
We show, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods.
arXiv Detail & Related papers (2022-09-09T10:01:46Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - BiasEnsemble: Revisiting the Importance of Amplifying Bias for Debiasing [31.665352191081357]
"Debiasing" aims to train a classifier to be less susceptible to dataset bias.
$f_B$ is trained to focus on bias-aligned samples while $f_D$ is mainly trained with bias-conflicting samples.
We propose a novel biased sample selection method BiasEnsemble which removes the bias-conflicting samples.
arXiv Detail & Related papers (2022-05-29T07:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.