Disambiguation of morpho-syntactic features of African American English
-- the case of habitual be
- URL: http://arxiv.org/abs/2204.12421v1
- Date: Tue, 26 Apr 2022 16:30:22 GMT
- Title: Disambiguation of morpho-syntactic features of African American English
-- the case of habitual be
- Authors: Harrison Santiago, Joshua Martin, Sarah Moeller, and Kevin Tang
- Abstract summary: Habitual "be" is isomorphic, and therefore ambiguous, with other forms of "be" found in both AAE and other varieties of English.
We employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances.
- Score: 1.4699455652461728
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research has highlighted that natural language processing (NLP)
systems exhibit a bias against African American speakers. The bias errors are
often caused by poor representation of linguistic features unique to African
American English (AAE), due to the relatively low probability of occurrence of
many such features in training data. We present a workflow to overcome such
bias in the case of habitual "be". Habitual "be" is isomorphic, and therefore
ambiguous, with other forms of "be" found in both AAE and other varieties of
English. This creates a clear challenge for bias in NLP technologies. To
overcome the scarcity, we employ a combination of rule-based filters and data
augmentation that generate a corpus balanced between habitual and non-habitual
instances. With this balanced corpus, we train unbiased machine learning
classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving
.65 F$_1$ score disambiguating habitual "be".
Related papers
- One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [55.35278531907263]
We present the first study on Large Language Models' fairness and robustness to a dialect in canonical reasoning tasks.
We hire AAVE speakers to rewrite seven popular benchmarks, such as HumanEval and GSM8K.
We find that, compared to Standardized English, almost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - A Comprehensive View of the Biases of Toxicity and Sentiment Analysis
Methods Towards Utterances with African American English Expressions [5.472714002128254]
We study bias on two Web-based (YouTube and Twitter) datasets and two spoken English datasets.
We isolate the impact of AAE expression usage via linguistic control features from the Linguistic Inquiry and Word Count software.
We present consistent results on how a heavy usage of AAE expressions may cause the speaker to be considered substantially more toxic, even when speaking about nearly the same subject.
arXiv Detail & Related papers (2024-01-23T12:41:03Z) - Towards a Deep Multi-layered Dialectal Language Analysis: A Case Study
of African-American English [0.20305676256390934]
Part-of-speech taggers trained on Mainstream American English (MAE) produce non-interpretable results when applied to African American English (AAE)
In this work, we incorporate a human-in-the-loop paradigm to gain a better understanding of AAE speakers' behavior and their language use.
arXiv Detail & Related papers (2022-06-03T01:05:58Z) - VALUE: Understanding Dialect Disparity in NLU [50.35526025326337]
We construct rules for 11 features of African American Vernacular English (AAVE)
We recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments.
Experiments show that these new dialectal features can lead to a drop in model performance.
arXiv Detail & Related papers (2022-04-06T18:30:56Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Machine Translationese: Effects of Algorithmic Bias on Linguistic
Complexity in Machine Translation [2.0625936401496237]
We go beyond the study of gender in Machine Translation and investigate how bias amplification might affect language in a broader sense.
We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms.
arXiv Detail & Related papers (2021-01-30T18:49:11Z) - Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases [10.713568409205077]
State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears.
We introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models.
We develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings.
arXiv Detail & Related papers (2020-06-06T19:49:50Z) - It's Morphin' Time! Combating Linguistic Discrimination with
Inflectional Perturbations [68.16751625956243]
Only perfect Standard English corpora predisposes neural networks to discriminate against minorities from non-standard linguistic backgrounds.
We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples.
arXiv Detail & Related papers (2020-05-09T04:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.