Related papers: Symmetric Saliency-based Adversarial Attack To Speaker Identification

Symmetric Saliency-based Adversarial Attack To Speaker Identification

URL: http://arxiv.org/abs/2210.16777v1
Date: Sun, 30 Oct 2022 08:54:02 GMT
Title: Symmetric Saliency-based Adversarial Attack To Speaker Identification
Authors: Jiadi Yao, Xing Chen, Xiao-Lei Zhang, Wei-Qiang Zhang and Kunde Yang
Abstract summary: We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED) First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system. Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
Score: 17.087523686496958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this paper, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.

Related papers

Selective Masking Adversarial Attack on Automatic Speech Recognition Systems [14.719709247756178]
We propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition. Experiments demonstrate that SMA attack can generate effective and audio adversarial examples in the dual-source scenario.
arXiv Detail & Related papers (2025-04-06T07:30:08Z)
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism [45.90677498529653]
We present a target speaker localization algorithm with a selective hearing mechanism. Our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.
arXiv Detail & Related papers (2023-10-16T15:19:05Z)
Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models. The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population. We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z)
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings [53.11450530896623]
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize "who spoke what" Our model is based on token-level serialized output training (t-SOT) which was recently proposed to transcribe multi-talker speech in a streaming fashion. The proposed model achieves substantially better accuracy than a prior streaming model and shows comparable or sometimes even superior results to the state-of-the-art offline SA-ASR model.
arXiv Detail & Related papers (2022-03-30T21:42:00Z)
Speaker Embedding-aware Neural Diarization: a Novel Framework for Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem. We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z)
Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances. Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations. To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model. Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function. We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z)
Integrated Replay Spoofing-aware Text-independent Speaker Verification [47.41124427552161]
We propose two approaches for building an integrated system of speaker verification and presentation attack detection. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning. We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
arXiv Detail & Related papers (2020-06-10T01:24:55Z)
Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification [21.582072216282725]
Machine learning systems and, specifically, automatic speech recognition (ASR) systems are vulnerable to adversarial attacks. In this paper, we focus on hybrid ASR systems and compare four acoustic models regarding their ability to indicate uncertainty under attack. We are able to detect adversarial examples with an area under the receiving operator curve score of more than 0.99.
arXiv Detail & Related papers (2020-05-24T19:31:02Z)
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam [100.95498268200777]
SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics. SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures. We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
arXiv Detail & Related papers (2020-01-23T05:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.