Symmetric Saliency-based Adversarial Attack To Speaker Identification
- URL: http://arxiv.org/abs/2210.16777v1
- Date: Sun, 30 Oct 2022 08:54:02 GMT
- Title: Symmetric Saliency-based Adversarial Attack To Speaker Identification
- Authors: Jiadi Yao, Xing Chen, Xiao-Lei Zhang, Wei-Qiang Zhang and Kunde Yang
- Abstract summary: We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED)
First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system.
Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
- Score: 17.087523686496958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attack approaches to speaker identification either need high
computational cost or are not very effective, to our knowledge. To address this
issue, in this paper, we propose a novel generation-network-based approach,
called symmetric saliency-based encoder-decoder (SSED), to generate adversarial
voice examples to speaker identification. It contains two novel components.
First, it uses a novel saliency map decoder to learn the importance of speech
samples to the decision of a targeted speaker identification system, so as to
make the attacker focus on generating artificial noise to the important
samples. It also proposes an angular loss function to push the speaker
embedding far away from the source speaker. Our experimental results
demonstrate that the proposed SSED yields the state-of-the-art performance,
i.e. over 97% targeted attack success rate and a signal-to-noise level of over
39 dB on both the open-set and close-set speaker identification tasks, with a
low computational cost.
Related papers
- LocSelect: Target Speaker Localization with an Auditory Selective
Hearing Mechanism [45.90677498529653]
We present a target speaker localization algorithm with a selective hearing mechanism.
Our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.
arXiv Detail & Related papers (2023-10-16T15:19:05Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings [53.11450530896623]
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize "who spoke what"
Our model is based on token-level serialized output training (t-SOT) which was recently proposed to transcribe multi-talker speech in a streaming fashion.
The proposed model achieves substantially better accuracy than a prior streaming model and shows comparable or sometimes even superior results to the state-of-the-art offline SA-ASR model.
arXiv Detail & Related papers (2022-03-30T21:42:00Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances.
Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations.
To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - Integrated Replay Spoofing-aware Text-independent Speaker Verification [47.41124427552161]
We propose two approaches for building an integrated system of speaker verification and presentation attack detection.
The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning.
We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
arXiv Detail & Related papers (2020-06-10T01:24:55Z) - Detecting Adversarial Examples for Speech Recognition via Uncertainty
Quantification [21.582072216282725]
Machine learning systems and, specifically, automatic speech recognition (ASR) systems are vulnerable to adversarial attacks.
In this paper, we focus on hybrid ASR systems and compare four acoustic models regarding their ability to indicate uncertainty under attack.
We are able to detect adversarial examples with an area under the receiving operator curve score of more than 0.99.
arXiv Detail & Related papers (2020-05-24T19:31:02Z) - Improving speaker discrimination of target speech extraction with
time-domain SpeakerBeam [100.95498268200777]
SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics.
SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures.
We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
arXiv Detail & Related papers (2020-01-23T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.