Automatic Dialect Density Estimation for African American English
- URL: http://arxiv.org/abs/2204.00967v1
- Date: Sun, 3 Apr 2022 01:34:48 GMT
- Title: Automatic Dialect Density Estimation for African American English
- Authors: Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari
Ostendorf, Abeer Alwan
- Abstract summary: We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
- Score: 74.44807604000967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore automatic prediction of dialect density of the
African American English (AAE) dialect, where dialect density is defined as the
percentage of words in an utterance that contain characteristics of the
non-standard dialect. We investigate several acoustic and language modeling
features, including the commonly used X-vector representation and ComParE
feature set, in addition to information extracted from ASR transcripts of the
audio files and prosodic information. To address issues of limited labeled
data, we use a weakly supervised model to project prosodic and X-vector
features into low-dimensional task-relevant representations. An XGBoost model
is then used to predict the speaker's dialect density from these features and
show which are most significant during inference. We evaluate the utility of
these features both alone and in combination for the given task. This work,
which does not rely on hand-labeled transcripts, is performed on audio segments
from the CORAAL database. We show a significant correlation between our
predicted and ground truth dialect density measures for AAE speech in this
database and propose this work as a tool for explaining and mitigating bias in
speech technology.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.
We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.
In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z) - Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - Establishing degrees of closeness between audio recordings along
different dimensions using large-scale cross-lingual models [4.349838917565205]
We propose a new unsupervised method using ABX tests on audio recordings with carefully curated metadata.
Three experiments are devised: one on room acoustics aspects, one on linguistic genre, and one on phonetic aspects.
The results confirm that the representations extracted from recordings with different linguistic/extra-linguistic characteristics differ along the same lines.
arXiv Detail & Related papers (2024-02-08T11:31:23Z) - From `Snippet-lects' to Doculects and Dialects: Leveraging Neural
Representations of Speech for Placing Audio Signals in a Language Landscape [3.96673286245683]
XLSR-53 a multilingual model of speech, builds a vector representation from audio.
We use max-pooling to aggregate the neural representations from a "snippet-lect" to a "doculect"
Similarity measurements between the 11 corpora bring out greatest closeness between those that are known to be dialects of the same language.
arXiv Detail & Related papers (2023-05-29T20:37:06Z) - End-to-End Automatic Speech Recognition model for the Sudanese Dialect [0.0]
This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect.
The paper gives an overview of the Sudanese dialect and the tasks of collecting represented resources and pre-processing performed to construct a modest dataset.
The designed model provided some insights into the current recognition task and reached an average Label Error Rate of 73.67%.
arXiv Detail & Related papers (2022-12-21T07:35:33Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech
Recognition System [3.4888132404740797]
We evaluate a state-of-the-art automatic speech recognition model, using unseen data from a corpus with a wide variety of labeled English accents.
We show that there is indeed an accuracy bias in terms of accentual variety, favoring the accents most prevalent in the training corpus.
arXiv Detail & Related papers (2021-05-09T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.