Related papers: Evaluating the Usage of African-American Vernacular English in Large Language Models

Evaluating the Usage of African-American Vernacular English in Large Language Models

URL: http://arxiv.org/abs/2602.21485v1
Date: Wed, 25 Feb 2026 01:28:01 GMT
Title: Evaluating the Usage of African-American Vernacular English in Large Language Models
Authors: Deja Dunlap, R. Thomas McCoy,
Abstract summary: We investigate how accurately large language models (LLMs) represent African American Vernacular English (AAVE)<n>We compare their usage of AAVE to the usage of humans who native speak AAVE.<n>We find that, in many cases, there are substantial differences between AAVE usage in LLMs and humans.
Score: 5.242425502046959
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In AI, most evaluations of natural language understanding tasks are conducted in standardized dialects such as Standard American English (SAE). In this work, we investigate how accurately large language models (LLMs) represent African American Vernacular English (AAVE). We analyze three LLMs to compare their usage of AAVE to the usage of humans who natively speak AAVE. We first analyzed interviews from the Corpus of Regional African American Language and TwitterAAE to identify the typical contexts where people use AAVE grammatical features such as ain't. We then prompted the LLMs to produce text in AAVE and compared the model-generated text to human usage patterns. We find that, in many cases, there are substantial differences between AAVE usage in LLMs and humans: LLMs usually underuse and misuse grammatical features characteristic of AAVE. Furthermore, through sentiment analysis and manual inspection, we found that the models replicated stereotypes about African Americans. These results highlight the need for more diversity in training data and the incorporation of fairness methods to mitigate the perpetuation of stereotypes.

Related papers

Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English [46.47177439553625]
This study examines emotion recognition model performance on African American Vernacular English (AAVE) compared to General American English (GAE)<n>We analyze 2.7 million tweets geo-tagged within Los Angeles.<n>We observe that neighborhoods with higher proportions of African American residents are associated with higher predictions of anger.
arXiv Detail & Related papers (2025-11-13T23:13:08Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Finding A Voice: Exploring the Potential of African American Dialect and Voice Generation for Chatbots [9.868899242620637]
This study focuses on integrating African American English (AAE) into virtual agents to better serve the African American community.<n>We develop text-based and spoken chatbots using large language models and text-to-speech technology.
arXiv Detail & Related papers (2025-01-07T00:07:01Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We introduce ReDial, a benchmark containing 1.2K+ parallel query pairs in Standardized English and AAVE.<n>We evaluate widely used models, including GPT, Claude, Llama, Mistral, and the Phi model families.<n>Our work establishes a systematic and objective framework for analyzing LLM bias in dialectal queries.
arXiv Detail & Related papers (2024-10-14T18:44:23Z)
Self-supervised Speech Representations Still Struggle with African American Vernacular English [28.223877889211803]
Underperformance of ASR systems for speakers of marginalized language varieties is a well-documented phenomenon. We investigate whether or not the recent wave of Self-Supervised Learning speech models can close the gap in ASR performance between AAVE and Mainstream American English.
arXiv Detail & Related papers (2024-08-26T13:29:25Z)
Evaluation of African American Language Bias in Natural Language Generation [9.823804049740916]
We evaluate how well LLMs understand African American Language (AAL) in comparison to their performance on White Mainstream English (WME) Our contributions include: (1) evaluation of six pre-trained, large language models on the two language generation tasks; (2) a novel dataset of AAL text from multiple contexts with human-annotated counterparts in WME; and (3) documentation of model performance gaps that suggest bias and identification of trends in lack of understanding of AAL features.
arXiv Detail & Related papers (2023-05-23T17:34:37Z)
VALUE: Understanding Dialect Disparity in NLU [50.35526025326337]
We construct rules for 11 features of African American Vernacular English (AAVE) We recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments. Experiments show that these new dialectal features can lead to a drop in model performance.
arXiv Detail & Related papers (2022-04-06T18:30:56Z)
Investigating African-American Vernacular English in Transformer-Based Text Generation [55.53547556060537]
Social media has encouraged the written use of African American Vernacular English (AAVE) We investigate the performance of GPT-2 on AAVE text by creating a dataset of intent-equivalent parallel AAVE/SAE tweet pairs. We find that while AAVE text results in more classifications of negative sentiment than SAE, the use of GPT-2 generally increases occurrences of positive sentiment for both.
arXiv Detail & Related papers (2020-10-06T06:27:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.