Related papers: Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception

Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception

URL: http://arxiv.org/abs/2506.17542v1
Date: Sat, 21 Jun 2025 01:44:19 GMT
Title: Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
Authors: Nitin Venkateswaran, Kevin Tang, Ratree Wayland,
Abstract summary: We investigate how pretrained representations encode phonological feature-level variations that influence the perception of segmental accent.<n>We focus on three segments: the labio approximant, the rhotic tap, and the retroflex stop, which are uniformly produced in the English of native speakers of Hindi.
Score: 0.8624680612413766
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Traditional models of accent perception underestimate the role of gradient variations in phonological features which listeners rely upon for their accent judgments. We investigate how pretrained representations from current self-supervised learning (SSL) models of speech encode phonological feature-level variations that influence the perception of segmental accent. We focus on three segments: the labiodental approximant, the rhotic tap, and the retroflex stop, which are uniformly produced in the English of native speakers of Hindi as well as other languages in the Indian sub-continent. We use the CSLU Foreign Accented English corpus (Lander, 2007) to extract, for these segments, phonological feature probabilities using Phonet (V\'asquez-Correa et al., 2019) and pretrained representations from Wav2Vec2-BERT (Barrault et al., 2023) and WavLM (Chen et al., 2022) along with accent judgements by native speakers of American English. Probing analyses show that accent strength is best predicted by a subset of the segment's pretrained representation features, in which perceptually salient phonological features that contrast the expected American English and realized non-native English segments are given prominent weighting. A multinomial logistic regression of pretrained representation-based segment distances from American and Indian English baselines on accent ratings reveals strong associations between the odds of accent strength and distances from the baselines, in the expected directions. These results highlight the value of self-supervised speech representations for modeling accent perception using interpretable phonological features.

Related papers

On the Relationship between Accent Strength and Articulatory Features [26.865464238029748]
This paper explores the relationship between accent strength and articulatory features inferred from acoustic speech.<n>The proposed framework leverages recent self-supervised learning articulatory inversion techniques to estimate articulatory features.<n>Results indicate that tongue positioning patterns distinguish the two dialects, with notable differences inter-dialects in rhotic and low back vowels.
arXiv Detail & Related papers (2025-07-03T20:08:28Z)
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS [52.89324095217975]
Previous approaches on accent conversion mainly aimed at making non-native speech sound more native.<n>We develop a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker.
arXiv Detail & Related papers (2024-10-19T06:12:31Z)
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent. This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z)
Transfer the linguistic representations from TTS to accent conversion with non-parallel data [7.376032484438044]
Accent conversion aims to convert the accent of a source speech to a target accent, preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech.
arXiv Detail & Related papers (2024-01-07T16:39:34Z)
Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z)
Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition [18.154258453839066]
We improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. We include phonetic knowledge in the ACM training to provide accurate feedback about how well certain pronunciation patterns were recovered in the synthesized waveform. We evaluate our approach on native and non-native English datasets and found that synthetically accented data helped the ASR to better understand speech from seen accents.
arXiv Detail & Related papers (2023-03-01T20:05:19Z)
Explicit Intensity Control for Accented Text-to-speech [65.35831577398174]
How to control the intensity of accent in the process of TTS is a very interesting research direction. Recent work design a speaker-versaadrial loss to disentangle the speaker and accent information, and then adjust the loss weight to control the accent intensity. This paper propose a new intuitive and explicit accent intensity control scheme for accented TTS.
arXiv Detail & Related papers (2022-10-27T12:23:41Z)
Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective [8.002498051045228]
Accented speech recognition and accent classification are relatively under-explored research areas in speech technology. Recent deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas. In this paper, we explored three main accent modelling methods combined with two different classifiers based on 105 speaker recordings retrieved from five urban varieties in Northern England.
arXiv Detail & Related papers (2022-06-26T01:25:17Z)
Self-supervised models of audio effectively explain human cortical responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system. We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect. dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect. We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z)
AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition [3.028098724882708]
We first spell out the key requirements for creating a well-curated database of speech samples in non-native accents for training and testing robust ASR systems. We then introduce AccentDB, one such database that contains samples of 4 Indian-English accents collected by us. We present several accent classification models and evaluate them thoroughly against human-labelled accent classes.
arXiv Detail & Related papers (2020-05-16T12:38:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.