Privacy against Real-Time Speech Emotion Detection via Acoustic
Adversarial Evasion of Machine Learning
- URL: http://arxiv.org/abs/2211.09273v4
- Date: Mon, 18 Dec 2023 19:27:19 GMT
- Title: Privacy against Real-Time Speech Emotion Detection via Acoustic
Adversarial Evasion of Machine Learning
- Authors: Brian Testa, Yi Xiao, Harshit Sharma, Avery Gump, and Asif Salekin
- Abstract summary: DARE-GP is a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech.
Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment.
- Score: 7.387631194438338
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Smart speaker voice assistants (VAs) such as Amazon Echo and Google Home have
been widely adopted due to their seamless integration with smart home devices
and the Internet of Things (IoT) technologies. These VA services raise privacy
concerns, especially due to their access to our speech. This work considers one
such use case: the unaccountable and unauthorized surveillance of a user's
emotion via speech emotion recognition (SER). This paper presents DARE-GP, a
solution that creates additive noise to mask users' emotional information while
preserving the transcription-relevant portions of their speech. DARE-GP does
this by using a constrained genetic programming approach to learn the spectral
frequency traits that depict target users' emotional content, and then
generating a universal adversarial audio perturbation that provides this
privacy protection. Unlike existing works, DARE-GP provides: a) real-time
protection of previously unheard utterances, b) against previously unseen
black-box SER classifiers, c) while protecting speech transcription, and d)
does so in a realistic, acoustic environment. Further, this evasion is robust
against defenses employed by a knowledgeable adversary. The evaluations in this
work culminate with acoustic evaluations against two off-the-shelf commercial
smart speakers using a small-form-factor (raspberry pi) integrated with a
wake-word system to evaluate the efficacy of its real-world, real-time
deployment.
Related papers
- Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - STAA-Net: A Sparse and Transferable Adversarial Attack for Speech
Emotion Recognition [36.73727306933382]
We propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models.
We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP)
arXiv Detail & Related papers (2024-02-02T08:46:57Z) - Evaluation of Speaker Anonymization on Emotional Speech [9.223908421919733]
Speech data carries a range of personal information, such as the speaker's identity and emotional state.
Current studies have addressed the topic of preserving speech privacy.
The VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization.
arXiv Detail & Related papers (2023-04-15T20:50:29Z) - Anonymizing Speech with Generative Adversarial Networks to Preserve
Speaker Privacy [22.84840887071428]
Speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.
This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications.
We propose to tackle this issue by generating speaker embeddings using a generative adversarial network with Wasserstein distance as cost function.
arXiv Detail & Related papers (2022-10-13T13:12:42Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Voice Privacy with Smart Digital Assistants in Educational Settings [1.8369974607582578]
We design and evaluate a practical and efficient framework for voice privacy at the source.
The approach combines speaker identification (SID) and speech conversion methods to randomly disguise the identity of users right on the device that records the speech.
We evaluate the ASR performance of the conversion in terms of word error rate and show the promise of this framework in preserving the content of the input speech.
arXiv Detail & Related papers (2021-03-24T19:58:45Z) - A GAN-based Approach for Mitigating Inference Attacks in Smart Home
Environment [3.785123406103385]
In this study, we explore the problem of adversaries spying on smart home users to infer sensitive information with the aid of machine learning techniques.
We propose a Generative Adrial Network (GAN) based approach for privacy preservation in smart homes.
arXiv Detail & Related papers (2020-11-13T02:14:32Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Multi-task self-supervised learning for Robust Speech Recognition [75.11748484288229]
This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments.
We employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances.
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
arXiv Detail & Related papers (2020-01-25T00:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.