Measuring Intersectional Biases in Historical Documents
- URL: http://arxiv.org/abs/2305.12376v1
- Date: Sun, 21 May 2023 07:10:31 GMT
- Title: Measuring Intersectional Biases in Historical Documents
- Authors: Nadav Borenstein and Karolina Sta\'nczak and Thea Rolskov and
Nat\'alia da Silva Perez and Natacha Klein K\"afer and Isabelle Augenstein
- Abstract summary: We investigate the continuities and transformations of bias in historical newspapers published in the Caribbean during the colonial era (18th to 19th centuries)
Our analyses are performed along the axes of gender, race, and their intersection.
We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset.
- Score: 37.03904311548859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-driven analyses of biases in historical texts can help illuminate the
origin and development of biases prevailing in modern society.
However, digitised historical documents pose a challenge for NLP
practitioners as these corpora suffer from errors introduced by optical
character recognition (OCR) and are written in an archaic language. In this
paper, we investigate the continuities and transformations of bias in
historical newspapers published in the Caribbean during the colonial era (18th
to 19th centuries). Our analyses are performed along the axes of gender, race,
and their intersection. We examine these biases by conducting a temporal study
in which we measure the development of lexical associations using
distributional semantics models and word embeddings. Further, we evaluate the
effectiveness of techniques designed to process OCR-generated data and assess
their stability when trained on and applied to the noisy historical newspapers.
We find that there is a trade-off between the stability of the word embeddings
and their compatibility with the historical dataset. We provide evidence that
gender and racial biases are interdependent, and their intersection triggers
distinct effects. These findings align with the theory of intersectionality,
which stresses that biases affecting people with multiple marginalised
identities compound to more than the sum of their constituents.
Related papers
- Contrastive Entity Coreference and Disambiguation for Historical Texts [2.446672595462589]
Existing entity disambiguation methods often fall short in accuracy for historical documents, which are replete with individuals not remembered in contemporary knowledgebases.
This study makes three key contributions to improve cross-document coreference resolution and disambiguation in historical texts.
arXiv Detail & Related papers (2024-06-21T18:22:14Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Toward Understanding Bias Correlations for Mitigation in NLP [34.956581421295]
This work aims to provide a first systematic study toward understanding bias correlations in mitigation.
We examine bias mitigation in two common NLP tasks -- toxicity detection and word embeddings.
Our findings suggest that biases are correlated and present scenarios in which independent debiasing approaches may be insufficient.
arXiv Detail & Related papers (2022-05-24T22:48:47Z) - Robust Quantification of Gender Disparity in Pre-Modern English
Literature using Natural Language Processing [8.185725740857594]
We demonstrate the significant discrepancy between the prevalence of female characters and male characters in pre-modern literature.
The discrepancy seems to be relatively stable as we plot data over the decades in this century-long period.
We aim to carefully describe both the limitations and ethical caveats associated with this study, and others like it.
arXiv Detail & Related papers (2022-04-12T15:11:22Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Diachronic Analysis of German Parliamentary Proceedings: Ideological
Shifts through the Lens of Political Biases [18.38810381745439]
We analyze bias in historical corpora by focusing on two specific forms of bias, namely a political (i.e., anti-communism) and racist (i.e., antisemitism)
We complement this analysis of historical biases in diachronic word embeddings with a novel measure of bias on the basis of term co-occurrences and graph-based label propagation.
The results of our bias measurements align with commonly perceived historical trends of antisemitic and anti-communist biases in German politics in different time periods.
arXiv Detail & Related papers (2021-08-13T15:58:07Z) - Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases [10.713568409205077]
State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears.
We introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models.
We develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings.
arXiv Detail & Related papers (2020-06-06T19:49:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.