A Novel Method for Analysing Racial Bias: Collection of Person Level
References
- URL: http://arxiv.org/abs/2310.15847v1
- Date: Tue, 24 Oct 2023 14:00:01 GMT
- Title: A Novel Method for Analysing Racial Bias: Collection of Person Level
References
- Authors: Muhammed Yusuf Kocyigit, Anietie Andy, Derry Wijaya
- Abstract summary: We propose a novel method to analyze the differences in representation between two groups.
We examine the representation of African Americans and White Americans in books between 1850 to 2000 with the Google Books dataset.
- Score: 6.345851712811529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long term exposure to biased content in literature or media can significantly
influence people's perceptions of reality, leading to the development of
implicit biases that are difficult to detect and address (Gerbner 1998). In
this study, we propose a novel method to analyze the differences in
representation between two groups and use it examine the representation of
African Americans and White Americans in books between 1850 to 2000 with the
Google Books dataset (Goldberg and Orwant 2013). By developing better tools to
understand differences in representation, we aim to contribute to the ongoing
efforts to recognize and mitigate biases. To improve upon the more common
phrase based (men, women, white, black, etc) methods to differentiate context
(Tripodi et al. 2019, Lucy; Tadimeti, and Bamman 2022), we propose collecting a
comprehensive list of historically significant figures and using their names to
select relevant context. This novel approach offers a more accurate and nuanced
method for detecting implicit biases through reducing the risk of selection
bias. We create group representations for each decade and analyze them in an
aligned semantic space (Hamilton, Leskovec, and Jurafsky 2016). We further
support our results by assessing the time adjusted toxicity (Bassignana,
Basile, and Patti 2018) in the context for each group and identifying the
semantic axes (Lucy, Tadimeti, and Bamman 2022) that exhibit the most
significant differences between the groups across decades. We support our
method by showing that our proposed method can capture known socio political
changes accurately and our findings indicate that while the relative number of
African American names mentioned in books have increased over time, the context
surrounding them remains more toxic than white Americans.
Related papers
- A Longitudinal Analysis of Racial and Gender Bias in New York Times and Fox News Images and Articles [2.482116411483087]
We use a dataset of 123,337 images and 441,321 online news articles from New York Times (NYT) and Fox News (Fox)
We examine the frequency and prominence of appearance of racial and gender groups in images embedded in news articles.
We find that NYT largely features more images of racial minority groups compared to Fox.
arXiv Detail & Related papers (2024-10-29T09:42:54Z) - On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models [21.178861746240507]
We study the presence of heteronormative biases and prejudice against interracial romantic relationships in large language models.
We show that models are less likely to predict romantic relationships for (a) same-gender character pairs than different-gender pairs; and (b) intra/inter-racial character pairs involving Asian names as compared to Black, Hispanic, or White names.
arXiv Detail & Related papers (2024-10-05T01:41:55Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - What's in a Name? Auditing Large Language Models for Race and Gender
Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.
We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z) - Computational Assessment of Hyperpartisanship in News Titles [55.92100606666497]
We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection.
Overall the Right media tends to use proportionally more hyperpartisan titles.
We identify three major topics including foreign issues, political systems, and societal issues that are suggestive of hyperpartisanship in news titles.
arXiv Detail & Related papers (2023-01-16T05:56:58Z) - Mitigating Racial Biases in Toxic Language Detection with an
Equity-Based Ensemble Framework [9.84413545378636]
Recent research has demonstrated how racial biases against users who write African American English exist in popular toxic language datasets.
We propose additional descriptive fairness metrics to better understand the source of these biases.
We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets.
arXiv Detail & Related papers (2021-09-27T15:54:05Z) - Avoiding bias when inferring race using name-based approaches [0.8543368663496084]
We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science.
Our results demonstrate that the validity of name based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors.
arXiv Detail & Related papers (2021-04-14T08:36:22Z) - Multilingual Contextual Affective Analysis of LGBT People Portrayals in
Wikipedia [34.183132688084534]
Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions.
We show how word connotations differ across languages and cultures, highlighting the difficulty of generalizing existing English datasets.
We then demonstrate the usefulness of our method by analyzing Wikipedia biography pages of members of the LGBT community across three languages.
arXiv Detail & Related papers (2020-10-21T08:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.