Demographic Attributes Prediction from Speech Using WavLM Embeddings
- URL: http://arxiv.org/abs/2502.12007v1
- Date: Mon, 17 Feb 2025 16:43:47 GMT
- Title: Demographic Attributes Prediction from Speech Using WavLM Embeddings
- Authors: Yuchen Yang, Thomas Thebaud, Najim Dehak,
- Abstract summary: This paper introduces a general classifier based on WavLM features, to infer demographic characteristics, such as age, gender, native language, education, and country, from speech.
The proposed framework identifies key acoustic and linguistic fea-tures associated with demographic attributes, achieving a Mean Absolute Error (MAE) of 4.94 for age prediction and over 99.81% accuracy for gender classification.
- Score: 25.00298717665857
- License:
- Abstract: This paper introduces a general classifier based on WavLM features, to infer demographic characteristics, such as age, gender, native language, education, and country, from speech. Demographic feature prediction plays a crucial role in applications like language learning, accessibility, and digital forensics, enabling more personalized and inclusive technologies. Leveraging pretrained models for embedding extraction, the proposed framework identifies key acoustic and linguistic fea-tures associated with demographic attributes, achieving a Mean Absolute Error (MAE) of 4.94 for age prediction and over 99.81% accuracy for gender classification across various datasets. Our system improves upon existing models by up to relative 30% in MAE and up to relative 10% in accuracy and F1 scores across tasks, leveraging a diverse range of datasets and large pretrained models to ensure robustness and generalizability. This study offers new insights into speaker diversity and provides a strong foundation for future research in speech-based demographic profiling.
Related papers
- Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants [10.227469020901232]
This paper introduces the Sonos Voice Control Bias Assessment dataset.
1,038 speakers, 166 hours, 170k audio samples, with 9,040 unique labelled transcripts.
Results show statistically significant differences in performance across age, dialectal region and ethnicity.
arXiv Detail & Related papers (2024-05-14T12:53:32Z) - SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in
Speech [0.0]
This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications.
Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper.
The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.
arXiv Detail & Related papers (2024-03-01T11:28:37Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - A Predictive Model of Digital Information Engagement: Forecasting User
Engagement With English Words by Incorporating Cognitive Biases,
Computational Linguistics and Natural Language Processing [3.09766013093045]
This study introduces and empirically tests a novel predictive model for digital information engagement (IE)
The READ model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement.
The READ model's potential extends across various domains, including business, education, government, and healthcare.
arXiv Detail & Related papers (2023-07-26T20:58:47Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Can Demographic Factors Improve Text Classification? Revisiting
Demographic Adaptation in the Age of Transformers [34.768337465321395]
Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models.
We use three common specialization methods proven effective for incorporating external knowledge into pretrained Transformers.
We adapt the language representations for the demographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning.
arXiv Detail & Related papers (2022-10-13T21:16:27Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Automated Speech Scoring System Under The Lens: Evaluating and
interpreting the linguistic cues for language proficiency [26.70127591966917]
We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem.
First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses.
In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
arXiv Detail & Related papers (2021-11-30T06:28:58Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - General-Purpose Speech Representation Learning through a Self-Supervised
Multi-Granularity Framework [114.63823178097402]
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.
Specifically, we propose to use generative learning approaches to capture fine-grained information at small time scales and use discriminative learning approaches to distill coarse-grained or semantic information at large time scales.
arXiv Detail & Related papers (2021-02-03T08:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.