Filtering Context Mitigates Scarcity and Selection Bias in Political
Ideology Prediction
- URL: http://arxiv.org/abs/2302.00239v1
- Date: Wed, 1 Feb 2023 04:34:48 GMT
- Title: Filtering Context Mitigates Scarcity and Selection Bias in Political
Ideology Prediction
- Authors: Chen Chen, Dylan Walker, Venkatesh Saligrama
- Abstract summary: We propose a novel supervised learning approach for political ideology prediction (PIP)
We show that our model is capable of outputting predictions even when trained with as little as 5% biased data.
- Score: 42.31457743674423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel supervised learning approach for political ideology
prediction (PIP) that is capable of predicting out-of-distribution inputs. This
problem is motivated by the fact that manual data-labeling is expensive, while
self-reported labels are often scarce and exhibit significant selection bias.
We propose a novel statistical model that decomposes the document embeddings
into a linear superposition of two vectors; a latent neutral \emph{context}
vector independent of ideology, and a latent \emph{position} vector aligned
with ideology. We train an end-to-end model that has intermediate contextual
and positional vectors as outputs. At deployment time, our model predicts
labels for input documents by exclusively leveraging the predicted positional
vectors. On two benchmark datasets we show that our model is capable of
outputting predictions even when trained with as little as 5\% biased data, and
is significantly more accurate than the state-of-the-art. Through
crowd-sourcing we validate the neutrality of contextual vectors, and show that
context filtering results in ideological concentration, allowing for prediction
on out-of-distribution examples.
Related papers
- Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Augmented prediction of a true class for Positive Unlabeled data under selection bias [0.8594140167290099]
We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled.
We argue that the additional information is important for prediction, and call this task "augmented PU prediction"
We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance.
arXiv Detail & Related papers (2024-07-14T19:58:01Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Addressing Discretization-Induced Bias in Demographic Prediction [18.427077352120254]
We show how argmax labeling results in a substantial under-count of African-American voters, by 28.2% points in North Carolina.
This bias can have substantial implications in downstream tasks that use such labels.
We introduce a $textitjoint optimization$ approach -- and a tractable $textitdata-driven thresholding$ -- that can eliminate this bias.
arXiv Detail & Related papers (2024-05-27T02:22:43Z) - Stochastic Online Conformal Prediction with Semi-Bandit Feedback [29.334511328067777]
We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically.
We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor.
arXiv Detail & Related papers (2024-05-22T00:42:49Z) - Adversarial Resilience in Sequential Prediction via Abstention [46.80218090768711]
We study the problem of sequential prediction in the setting with an adversary that is allowed to inject clean-label adversarial examples.
We propose a new model of sequential prediction that sits between the purely and fully adversarial settings.
arXiv Detail & Related papers (2023-06-22T17:44:22Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Measuring Model Biases in the Absence of Ground Truth [2.802021236064919]
We introduce a new framing to the measurement of fairness and bias that does not rely on ground truth labels.
Instead, we treat the model predictions for a given image as a set of labels, analogous to a 'bag of words' approach used in Natural Language Processing (NLP)
We demonstrate how the statistical properties (especially normalization) of the different association metrics can lead to different sets of labels detected as having "gender bias"
arXiv Detail & Related papers (2021-03-05T01:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.