Related papers: Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

URL: http://arxiv.org/abs/2301.09058v1
Date: Sun, 22 Jan 2023 05:01:13 GMT
Title: Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification
Authors: Kwangje Baeg, Yeong-Gwan Kim, Young-Sub Han, Byoung-Ki Jeon
Abstract summary: We consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive adversarial network in multi-objective scenarios.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech information leakage. Hence, to improve age group classification performance, we consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. In addition, we investigated different types of speaker embeddings to learn and generalize the domain-invariant representations for age groups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive adversarial network in multi-objective scenarios and leveraging speaker embeddings for the domain adaptation task.

Related papers

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers [50.9040167152168]
We analyze neurons associated with k-means clusters of self-supervised features and i-vectors.<n>Our analysis reveals that these clusters correspond to broad phonetic and gender classes.<n>By protecting these neurons during pruning, we can significantly preserve performance on speaker-related task.
arXiv Detail & Related papers (2025-06-26T18:54:26Z)
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering [78.2927924732142]
We propose speaker-invariant clustering (Spin) as a novel self-supervised learning method. Spin disentangles speaker information and preserves content representations with just 45 minutes of fine-tuning on a single GPU.
arXiv Detail & Related papers (2023-05-18T15:59:36Z)
Residual Information in Deep Speaker Embedding Architectures [4.619541348328938]
This paper introduces an analysis over six sets of speaker embeddings extracted with some of the most recent and high-performing DNN architectures. The dataset includes 46 speakers uttering the same set of prompts, recorded in either a professional studio or their home environments. The results show that the discriminative power of the analyzed embeddings is very high, yet across all the analyzed architectures, residual information is still present in the representations.
arXiv Detail & Related papers (2023-02-06T12:37:57Z)
Self-supervised Speaker Diarization [19.111219197011355]
This study proposes an entirely unsupervised deep-learning model for speaker diarization. Speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker.
arXiv Detail & Related papers (2022-04-08T16:27:14Z)
Improved Relation Networks for End-to-End Speaker Verification and Identification [0.0]
Speaker identification systems are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples. We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification. Inspired by the use of prototypical networks in speaker verification, we train the model to classify samples in the current episode amongst all speakers present in the training set.
arXiv Detail & Related papers (2022-03-31T17:44:04Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
A Review of Speaker Diarization: Recent Advances with Deep Learning [78.20151731627958]
Speaker diarization is a task to label audio or video recordings with classes corresponding to speaker identity. With the rise of deep learning technology, more rapid advancements have been made for speaker diarization. We discuss how speaker diarization systems have been integrated with speech recognition applications.
arXiv Detail & Related papers (2021-01-24T01:28:05Z)
Active Speakers in Context [88.22935329360618]
Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our experiments show that a structured feature ensemble already benefits the active speaker detection performance.
arXiv Detail & Related papers (2020-05-20T01:14:23Z)
Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features. We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam [100.95498268200777]
SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics. SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures. We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
arXiv Detail & Related papers (2020-01-23T05:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.