Improving speaker de-identification with functional data analysis of f0
trajectories
- URL: http://arxiv.org/abs/2203.16738v1
- Date: Thu, 31 Mar 2022 01:34:15 GMT
- Title: Improving speaker de-identification with functional data analysis of f0
trajectories
- Authors: Lauri Tavi, Tomi Kinnunen, Rosa Gonz\'alez Hautam\"aki
- Abstract summary: Formant modification is a simpler, yet effective method for speaker de-identification which requires no training data.
This study introduces a novel speaker de-identification method, which, in addition to simple formant shifts, manipulates f0 trajectories based on functional data analysis.
The proposed speaker de-identification method will conceal plausibly identifying pitch characteristics in a phonetically controllable manner and improve formant-based speaker de-identification up to 25%.
- Score: 10.809893662563926
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Due to a constantly increasing amount of speech data that is stored in
different types of databases, voice privacy has become a major concern. To
respond to such concern, speech researchers have developed various methods for
speaker de-identification. The state-of-the-art solutions utilize deep learning
solutions which can be effective but might be unavailable or impractical to
apply for, for example, under-resourced languages. Formant modification is a
simpler, yet effective method for speaker de-identification which requires no
training data. Still, remaining intonational patterns in formant-anonymized
speech may contain speaker-dependent cues. This study introduces a novel
speaker de-identification method, which, in addition to simple formant shifts,
manipulates f0 trajectories based on functional data analysis. The proposed
speaker de-identification method will conceal plausibly identifying pitch
characteristics in a phonetically controllable manner and improve formant-based
speaker de-identification up to 25%.
Related papers
- Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity.
Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent.
This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z) - Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Personalizing Keyword Spotting with Speaker Information [11.4457776449367]
Keywords spotting systems often struggle to generalize to a diverse population with various accents and age groups.
We propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM)
Our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost.
arXiv Detail & Related papers (2023-11-06T12:16:06Z) - SPADE: Self-supervised Pretraining for Acoustic DisEntanglement [2.294014185517203]
We introduce a self-supervised approach to disentangle room acoustics from speech.
Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce.
arXiv Detail & Related papers (2023-02-03T01:36:38Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Self-Supervised Learning for Personalized Speech Enhancement [25.05285328404576]
Speech enhancement systems can show improved performance by adapting the model towards a single test-time speaker.
Test-time user might only provide a small amount of noise-free speech data, likely insufficient for traditional fully-supervised learning.
We propose self-supervised methods that are designed specifically to learn personalized and discriminative features from abundant in-the-wild noisy, but still personal speech recordings.
arXiv Detail & Related papers (2021-04-05T17:12:51Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Improving on-device speaker verification using federated learning with
privacy [5.321241042620525]
Information on speaker characteristics can be useful as side information in improving speaker recognition accuracy.
This paper investigates how privacy-preserving learning can improve a speaker verification system.
arXiv Detail & Related papers (2020-08-06T13:37:14Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.