BiasBuster: a Neural Approach for Accurate Estimation of Population
Statistics using Biased Location Data
- URL: http://arxiv.org/abs/2402.11318v1
- Date: Sat, 17 Feb 2024 16:16:24 GMT
- Title: BiasBuster: a Neural Approach for Accurate Estimation of Population
Statistics using Biased Location Data
- Authors: Sepanta Zeighami, Cyrus Shahabi
- Abstract summary: We show that statistical debiasing, although in some cases useful, often fails to improve accuracy.
We then propose BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics.
- Score: 6.077198822448429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While extremely useful (e.g., for COVID-19 forecasting and policy-making,
urban mobility analysis and marketing, and obtaining business insights),
location data collected from mobile devices often contain data from a biased
population subset, with some communities over or underrepresented in the
collected datasets. As a result, aggregate statistics calculated from such
datasets (as is done by various companies including Safegraph, Google, and
Facebook), while ignoring the bias, leads to an inaccurate representation of
population statistics. Such statistics will not only be generally inaccurate,
but the error will disproportionately impact different population subgroups
(e.g., because they ignore the underrepresented communities). This has dire
consequences, as these datasets are used for sensitive decision-making such as
COVID-19 policymaking. This paper tackles the problem of providing accurate
population statistics using such biased datasets. We show that statistical
debiasing, although in some cases useful, often fails to improve accuracy. We
then propose BiasBuster, a neural network approach that utilizes the
correlations between population statistics and location characteristics to
provide accurate estimates of population statistics. Extensive experiments on
real-world data show that BiasBuster improves accuracy by up to 2 times in
general and up to 3 times for underrepresented populations.
Related papers
- Dataset Representativeness and Downstream Task Fairness [24.570493924073524]
We demonstrate that there is a natural tension between dataset representativeness and group-fairness of classifiers trained on that dataset.
We also find that over-sampling underrepresented groups can result in classifiers which exhibit greater bias to those groups.
arXiv Detail & Related papers (2024-06-28T18:11:16Z) - Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead [0.48163317476588574]
We show that bias exists in all 10 datasets of 5 languages evaluated, including benchmark datasets on the English GLUE/SuperGLUE leaderboards.
The 3 new languages give a total of almost 6 million labeled samples and we benchmark on these datasets using SotA multilingual pretrained models: mT5 and mBERT.
arXiv Detail & Related papers (2024-04-07T07:24:45Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - DSAP: Analyzing Bias Through Demographic Comparison of Datasets [4.8741052091630985]
We propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets.
DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios.
An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations.
arXiv Detail & Related papers (2023-12-22T11:51:20Z) - Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases.
We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data.
We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples.
Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - AutoDebias: Learning to Debias for Recommendation [43.84313723394282]
We propose textitAotoDebias that leverages another (small) set of uniform data to optimize the debiasing parameters.
We derive the generalization bound for AutoDebias and prove its ability to acquire the appropriate debiasing strategy.
arXiv Detail & Related papers (2021-05-10T08:03:48Z) - Multi-Source Causal Inference Using Control Variates [81.57072928775509]
We propose a general algorithm to estimate causal effects from emphmultiple data sources.
We show theoretically that this reduces the variance of the ATE estimate.
We apply this framework to inference from observational data under an outcome selection bias.
arXiv Detail & Related papers (2021-03-30T21:20:51Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.