Counterfactual Probing for the Influence of Affect and Specificity on
Intergroup Bias
- URL: http://arxiv.org/abs/2305.16409v2
- Date: Fri, 2 Jun 2023 19:11:52 GMT
- Title: Counterfactual Probing for the Influence of Affect and Specificity on
Intergroup Bias
- Authors: Venkata S Govindarajan, Kyle Mahowald, David I. Beaver, Junyi Jessy Li
- Abstract summary: We investigate if two pragmatic features (specificity and affect) systematically vary in different intergroup contexts.
Preliminary analysis finds modest correlations between specificity and affect of tweets with supervised intergroup relationship labels.
- Score: 23.32083897119715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While existing work on studying bias in NLP focues on negative or pejorative
language use, Govindarajan et al. (2023) offer a revised framing of bias in
terms of intergroup social context, and its effects on language behavior. In
this paper, we investigate if two pragmatic features (specificity and affect)
systematically vary in different intergroup contexts -- thus connecting this
new framing of bias to language output. Preliminary analysis finds modest
correlations between specificity and affect of tweets with supervised
intergroup relationship (IGR) labels. Counterfactual probing further reveals
that while neural models finetuned for predicting IGR labels reliably use
affect in classification, the model's usage of specificity is inconclusive.
Code and data can be found at: https://github.com/venkatasg/intergroup-probing
Related papers
- A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers [0.0]
We create counterfactual examples with small perturbations on target-domain data instead of relying on templates or specific datasets for bias detection.
On widely used classifiers for subjectivity analysis, including sentiment, emotion, hate speech, our results demonstrate positive biases related to the language spoken in a country.
arXiv Detail & Related papers (2024-07-01T22:17:17Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Understanding and Mitigating Spurious Correlations in Text
Classification with Neighborhood Analysis [69.07674653828565]
Machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances.
In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis.
We propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification.
arXiv Detail & Related papers (2023-05-23T03:55:50Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Hate Speech Detection and Racial Bias Mitigation in Social Media based
on BERT model [1.9336815376402716]
We introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT.
We evaluate the proposed model on two publicly available datasets annotated for racism, sexism, hate or offensive content on Twitter.
arXiv Detail & Related papers (2020-08-14T16:47:25Z) - Counterfactual VQA: A Cause-Effect Look at Language Bias [117.84189187160005]
VQA models tend to rely on language bias as a shortcut and fail to sufficiently learn the multi-modal knowledge from both vision and language.
We propose a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers.
arXiv Detail & Related papers (2020-06-08T01:49:27Z) - Towards classification parity across cohorts [16.21248370949611]
This research work aims to achieve classification parity across explicit as well as implicit sensitive features.
We obtain implicit cohorts by clustering embeddings of each individual trained on the language generated by them using a language model.
We improve classification parity by introducing modification to the loss function aimed to minimize the range of model performances across cohorts.
arXiv Detail & Related papers (2020-05-16T16:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.