Towards a Deep Multi-layered Dialectal Language Analysis: A Case Study
of African-American English
- URL: http://arxiv.org/abs/2206.08978v1
- Date: Fri, 3 Jun 2022 01:05:58 GMT
- Title: Towards a Deep Multi-layered Dialectal Language Analysis: A Case Study
of African-American English
- Authors: Jamell Dacon
- Abstract summary: Part-of-speech taggers trained on Mainstream American English (MAE) produce non-interpretable results when applied to African American English (AAE)
In this work, we incorporate a human-in-the-loop paradigm to gain a better understanding of AAE speakers' behavior and their language use.
- Score: 0.20305676256390934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, natural language processing (NLP) models proliferate language
discrimination leading to potentially harmful societal impacts as a result of
biased outcomes. For example, part-of-speech taggers trained on Mainstream
American English (MAE) produce non-interpretable results when applied to
African American English (AAE) as a result of language features not seen during
training. In this work, we incorporate a human-in-the-loop paradigm to gain a
better understanding of AAE speakers' behavior and their language use, and
highlight the need for dialectal language inclusivity so that native AAE
speakers can extensively interact with NLP systems while reducing feelings of
disenfranchisement.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models [16.0617753653454]
This study presents a comparative analysis between human performance and SSL models.
We also compare the SER ability of models and humans at both utterance- and segment-levels.
Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers.
arXiv Detail & Related papers (2024-09-25T13:27:17Z) - Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance [3.344876133162209]
Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora.
This study investigates whether the quality of LLM responses varies depending on the demographic profile of users.
arXiv Detail & Related papers (2024-06-25T09:04:21Z) - Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness [16.746758715820324]
We present a multitask learning approach that models dialect language as an auxiliary task to incorporate syntactic and lexical variations.
In our experiments with African-American English dialect, we provide empirical evidence that complementing common learning approaches with dialect modeling improves their fairness.
Results suggest that multitask learning achieves state-of-the-art performance and helps to detect properties of biased language more reliably.
arXiv Detail & Related papers (2024-06-14T12:39:39Z) - A Taxonomy of Ambiguity Types for NLP [53.10379645698917]
We propose a taxonomy of ambiguity types as seen in English to facilitate NLP analysis.
Our taxonomy can help make meaningful splits in language ambiguity data, allowing for more fine-grained assessments of both datasets and model performance.
arXiv Detail & Related papers (2024-03-21T01:47:22Z) - Sociolinguistically Informed Interpretability: A Case Study on Hinglish
Emotion Classification [8.010713141364752]
We study the effect of language on emotion prediction across 3 PLMs on a Hinglish emotion classification dataset.
We find that models do learn these associations between language choice and emotional expression.
Having code-mixed data present in the pre-training can augment that learning when task-specific data is scarce.
arXiv Detail & Related papers (2024-02-05T16:05:32Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Relationship of the language distance to English ability of a country [0.0]
We introduce a novel solution to measure the semantic dissimilarity between languages.
We empirically examine the effectiveness of the proposed semantic language distance.
The experimental results show that the language distance demonstrates negative influence on a country's average English ability.
arXiv Detail & Related papers (2022-11-15T02:40:00Z) - VALUE: Understanding Dialect Disparity in NLU [50.35526025326337]
We construct rules for 11 features of African American Vernacular English (AAVE)
We recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments.
Experiments show that these new dialectal features can lead to a drop in model performance.
arXiv Detail & Related papers (2022-04-06T18:30:56Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.