Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English
- URL: http://arxiv.org/abs/2503.04099v1
- Date: Thu, 06 Mar 2025 05:15:34 GMT
- Title: Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English
- Authors: Runtao Zhou, Guangya Wan, Saadia Gabriel, Sheng Li, Alexander J Gates, Maarten Sap, Thomas Hartvigsen,
- Abstract summary: We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
- Score: 66.97110551643722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning tasks, leading to their widespread deployment. However, recent studies have highlighted concerning biases in these models, particularly in their handling of dialectal variations like African American English (AAE). In this work, we systematically investigate dialectal disparities in LLM reasoning tasks. We develop an experimental framework comparing LLM performance given Standard American English (SAE) and AAE prompts, combining LLM-based dialect conversion with established linguistic analyses. We find that LLMs consistently produce less accurate responses and simpler reasoning chains and explanations for AAE inputs compared to equivalent SAE questions, with disparities most pronounced in social science and humanities domains. These findings highlight systematic differences in how LLMs process and reason about different language varieties, raising important questions about the development and deployment of these systems in our multilingual and multidialectal world. Our code repository is publicly available at https://github.com/Runtaozhou/dialect_bias_eval.
Related papers
- Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs? [2.3749120526936465]
This study explores how recent large language models (LLMs) navigate relative clause attachment ambiguity in six typologically diverse languages.
arXiv Detail & Related papers (2025-03-13T19:44:15Z) - Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models [2.3749120526936465]
Large language models (LLMs) resolve relative clause (RC) attachment ambiguities.<n>We assess whether LLMs can achieve human-like interpretations amid the complexities of language.<n>We evaluate models in English, Spanish, French, German, Japanese, and Korean.
arXiv Detail & Related papers (2025-03-04T19:56:56Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs [13.558778781305998]
Large Language Models (LLMs) are predominantly designed with English as the primary language.
Even the few that are multilingual tend to exhibit strong English-centric biases.
This paper introduces novel automatic corpus-level metrics to assess the lexical and syntactic naturalness of multilingual outputs.
arXiv Detail & Related papers (2024-10-21T12:34:17Z) - One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We present the first study aimed at objectively assessing the fairness and robustness of Large Language Models (LLMs) in handling dialects in canonical reasoning tasks.<n>We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K.<n>Our findings reveal that textbfalmost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - Don't Trust ChatGPT when Your Question is not in English: A Study of
Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities.
We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings.
The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.