Related papers: STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

URL: http://arxiv.org/abs/2402.01227v1
Date: Fri, 2 Feb 2024 08:46:57 GMT
Title: STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
Authors: Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Bj\"orn W. Schuller
Abstract summary: We propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP)
Score: 36.73727306933382
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models in an end-to-end and efficient manner. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP), and demonstrate its ability to generate successful sparse adversarial examples in an efficient manner. Moreover, our generated adversarial examples exhibit model-agnostic transferability, enabling effective adversarial attacks on advanced victim models.

Related papers

Personalized Attacks of Social Engineering in Multi-turn Conversations -- LLM Agents for Simulation and Detection [19.625518218365382]
Social engineering (SE) attacks on social media platforms pose a significant risk. We propose an LLM-agentic framework, SE-VSim, to simulate SE attack mechanisms by generating multi-turn conversations. We present a proof of concept, SE-OmniGuard, to offer personalized protection to users by leveraging prior knowledge of the victims personality.
arXiv Detail & Related papers (2025-03-18T19:14:44Z)
Exploring Audio Editing Features as User-Centric Privacy Defenses Against Large Language Model(LLM) Based Emotion Inference Attacks [0.0]
Existing privacy-preserving methods compromise usability and security, limiting their adoption in practical scenarios. This paper introduces a novel, user-centric approach that leverages familiar audio editing techniques, specifically pitch and tempo manipulation, to protect emotional privacy without sacrificing usability. Our experiments, conducted on three distinct datasets, demonstrate that pitch and tempo manipulation effectively obfuscates emotional data.
arXiv Detail & Related papers (2025-01-30T20:07:44Z)
Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation [23.805401747928745]
This paper proposes a novel adversarial prompt attack tailored to language-conditioned robotic models. We demonstrate that existing adversarial techniques exhibit limited effectiveness when directly transferred to the robotic domain. We identify the beneficial impact of intermediate features on adversarial attacks and leverage the negative gradient of intermediate self-attention features to further enhance attack efficacy.
arXiv Detail & Related papers (2024-11-21T02:46:04Z)
DiffuseDef: Improved Robustness to Adversarial Attacks [38.34642687239535]
adversarial attacks pose a critical challenge to system built using pretrained language models. We propose DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier. During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation.
arXiv Detail & Related papers (2024-06-28T22:36:17Z)
Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z)
Investigating Human-Identifiable Features Hidden in Adversarial Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets. We identify human-identifiable features in adversarial perturbations. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z)
Defense Against Adversarial Attacks on Audio DeepFake Detection [0.4511923587827302]
Audio DeepFakes (DF) are artificially generated utterances created using deep learning. Multiple neural network-based methods to detect generated speech have been proposed to prevent the threats.
arXiv Detail & Related papers (2022-12-30T08:41:06Z)
Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning [7.387631194438338]
DARE-GP is a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech. Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment.
arXiv Detail & Related papers (2022-11-17T00:25:05Z)
Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition [12.024098046435796]
Speech data cannot be protected when uploaded and processed on servers in internet-of-things applications. Deep neural networks have proven to be vulnerable to human-indistinguishable adversarial perturbations. We propose a novel federated adversarial learning framework for protecting both data and deep neural networks.
arXiv Detail & Related papers (2022-03-09T13:19:26Z)
Modelling Adversarial Noise for Adversarial Defense [96.56200586800219]
adversarial defenses typically focus on exploiting adversarial examples to remove adversarial noise or train an adversarially robust target model. Motivated by that the relationship between adversarial data and natural data can help infer clean data from adversarial data to obtain the final correct prediction. We study to model adversarial noise to learn the transition relationship in the label space for using adversarial labels to improve adversarial accuracy.
arXiv Detail & Related papers (2021-09-21T01:13:26Z)
An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism. Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes. Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z)
Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space. We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space. Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.