STAA-Net: A Sparse and Transferable Adversarial Attack for Speech
Emotion Recognition
- URL: http://arxiv.org/abs/2402.01227v1
- Date: Fri, 2 Feb 2024 08:46:57 GMT
- Title: STAA-Net: A Sparse and Transferable Adversarial Attack for Speech
Emotion Recognition
- Authors: Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu,
Tanja Schultz, Bj\"orn W. Schuller
- Abstract summary: We propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models.
We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP)
- Score: 36.73727306933382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech contains rich information on the emotions of humans, and Speech
Emotion Recognition (SER) has been an important topic in the area of
human-computer interaction. The robustness of SER models is crucial,
particularly in privacy-sensitive and reliability-demanding domains like
private healthcare. Recently, the vulnerability of deep neural networks in the
audio domain to adversarial attacks has become a popular area of research.
However, prior works on adversarial attacks in the audio domain primarily rely
on iterative gradient-based techniques, which are time-consuming and prone to
overfitting the specific threat model. Furthermore, the exploration of sparse
perturbations, which have the potential for better stealthiness, remains
limited in the audio domain. To address these challenges, we propose a
generator-based attack method to generate sparse and transferable adversarial
examples to deceive SER models in an end-to-end and efficient manner. We
evaluate our method on two widely-used SER datasets, Database of Elicited Mood
in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP),
and demonstrate its ability to generate successful sparse adversarial examples
in an efficient manner. Moreover, our generated adversarial examples exhibit
model-agnostic transferability, enabling effective adversarial attacks on
advanced victim models.
Related papers
- Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation [23.805401747928745]
This paper proposes a novel adversarial prompt attack tailored to language-conditioned robotic models.
We demonstrate that existing adversarial techniques exhibit limited effectiveness when directly transferred to the robotic domain.
We identify the beneficial impact of intermediate features on adversarial attacks and leverage the negative gradient of intermediate self-attention features to further enhance attack efficacy.
arXiv Detail & Related papers (2024-11-21T02:46:04Z) - DiffuseDef: Improved Robustness to Adversarial Attacks [38.34642687239535]
adversarial attacks pose a critical challenge to system built using pretrained language models.
We propose DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier.
During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation.
arXiv Detail & Related papers (2024-06-28T22:36:17Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Defense Against Adversarial Attacks on Audio DeepFake Detection [0.4511923587827302]
Audio DeepFakes (DF) are artificially generated utterances created using deep learning.
Multiple neural network-based methods to detect generated speech have been proposed to prevent the threats.
arXiv Detail & Related papers (2022-12-30T08:41:06Z) - Privacy against Real-Time Speech Emotion Detection via Acoustic
Adversarial Evasion of Machine Learning [7.387631194438338]
DARE-GP is a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech.
Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment.
arXiv Detail & Related papers (2022-11-17T00:25:05Z) - Robust Federated Learning Against Adversarial Attacks for Speech Emotion
Recognition [12.024098046435796]
Speech data cannot be protected when uploaded and processed on servers in internet-of-things applications.
Deep neural networks have proven to be vulnerable to human-indistinguishable adversarial perturbations.
We propose a novel federated adversarial learning framework for protecting both data and deep neural networks.
arXiv Detail & Related papers (2022-03-09T13:19:26Z) - Modelling Adversarial Noise for Adversarial Defense [96.56200586800219]
adversarial defenses typically focus on exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model.
Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction.
We study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy.
arXiv Detail & Related papers (2021-09-21T01:13:26Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space.
We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space.
Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.