VALUE: Understanding Dialect Disparity in NLU
- URL: http://arxiv.org/abs/2204.03031v1
- Date: Wed, 6 Apr 2022 18:30:56 GMT
- Title: VALUE: Understanding Dialect Disparity in NLU
- Authors: Caleb Ziems, Jiaao Chen, Camille Harris, Jessica Anderson, Diyi Yang
- Abstract summary: We construct rules for 11 features of African American Vernacular English (AAVE)
We recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments.
Experiments show that these new dialectal features can lead to a drop in model performance.
- Score: 50.35526025326337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: English Natural Language Understanding (NLU) systems have achieved great
performances and even outperformed humans on benchmarks like GLUE and
SuperGLUE. However, these benchmarks contain only textbook Standard American
English (SAE). Other dialects have been largely overlooked in the NLP
community. This leads to biased and inequitable NLU systems that serve only a
sub-population of speakers. To understand disparities in current models and to
facilitate more dialect-competent NLU systems, we introduce the VernAcular
Language Understanding Evaluation (VALUE) benchmark, a challenging variant of
GLUE that we created with a set of lexical and morphosyntactic transformation
rules. In this initial release (V.1), we construct rules for 11 features of
African American Vernacular English (AAVE), and we recruit fluent AAVE speakers
to validate each feature transformation via linguistic acceptability judgments
in a participatory design manner. Experiments show that these new dialectal
features can lead to a drop in model performance.
Related papers
- One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [55.35278531907263]
We present the first study on Large Language Models' fairness and robustness to a dialect in canonical reasoning tasks.
We hire AAVE speakers to rewrite seven popular benchmarks, such as HumanEval and GSM8K.
We find that, compared to Standardized English, almost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark [3.1927733045184885]
AAVENUE is a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English.
We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including fluency, BARTScore, quality, coherence, and understandability.
Our evaluations reveal that LLMs consistently perform better on SAE tasks than AAVE-translated versions, underscoring inherent biases.
arXiv Detail & Related papers (2024-08-27T07:56:35Z) - Self-supervised Speech Representations Still Struggle with African American Vernacular English [28.223877889211803]
Underperformance of ASR systems for speakers of marginalized language varieties is a well-documented phenomenon.
We investigate whether or not the recent wave of Self-Supervised Learning speech models can close the gap in ASR performance between AAVE and Mainstream American English.
arXiv Detail & Related papers (2024-08-26T13:29:25Z) - DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages [49.38663048447942]
We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
arXiv Detail & Related papers (2024-03-16T20:18:36Z) - Task-Agnostic Low-Rank Adapters for Unseen English Dialects [52.88554155235167]
Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English.
By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion.
arXiv Detail & Related papers (2023-11-02T01:17:29Z) - DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules [64.93179829965072]
DADA is a modular approach to imbue SAE-trained models with multi-dialectal robustness.
We show that DADA is effective for both single task and instruction fine language models.
arXiv Detail & Related papers (2023-05-22T18:43:31Z) - The Interpreter Understands Your Meaning: End-to-end Spoken Language
Understanding Aided by Speech Translation [13.352795145385645]
Speech translation (ST) is a good means of pretraining speech models for end-to-end spoken language understanding.
We show that our models reach higher performance over baselines on monolingual and multilingual intent classification.
We also create new benchmark datasets for speech summarization and low-resource/zero-shot transfer from English to French or Spanish.
arXiv Detail & Related papers (2023-05-16T17:53:03Z) - Multi-VALUE: A Framework for Cross-Dialectal English NLP [49.55176102659081]
Multi- Dialect is a controllable rule-based translation system spanning 50 English dialects.
Stress tests reveal significant performance disparities for leading models on non-standard dialects.
We partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task.
arXiv Detail & Related papers (2022-12-15T18:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.