Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for
End-to-End Speech Systems
- URL: http://arxiv.org/abs/2103.08086v1
- Date: Mon, 15 Mar 2021 01:11:13 GMT
- Title: Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for
End-to-End Speech Systems
- Authors: Mohammad Esmaeilpour and Patrick Cardinal and Alessandro Lameiras
Koerich
- Abstract summary: This paper introduces a defense approach against end-to-end adversarial attacks developed for cutting-edge speech-to-text systems.
First, we represent speech signals with 2D spectrograms using the short-time Fourier transform.
Second, we iteratively find a safe vector using a spectrogram subspace projection operation.
Third, we synthesize a spectrogram with such a safe vector using a novel GAN architecture trained with Sobolev integral probability metric.
- Score: 78.5097679815944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a defense approach against end-to-end adversarial
attacks developed for cutting-edge speech-to-text systems. The proposed defense
algorithm has four major steps. First, we represent speech signals with 2D
spectrograms using the short-time Fourier transform. Second, we iteratively
find a safe vector using a spectrogram subspace projection operation. This
operation minimizes the chordal distance adjustment between spectrograms with
an additional regularization term. Third, we synthesize a spectrogram with such
a safe vector using a novel GAN architecture trained with Sobolev integral
probability metric. To improve the model's performance in terms of stability
and the total number of learned modes, we impose an additional constraint on
the generator network. Finally, we reconstruct the signal from the synthesized
spectrogram and the Griffin-Lim phase approximation technique. We evaluate the
proposed defense approach against six strong white and black-box adversarial
attacks benchmarked on DeepSpeech, Kaldi, and Lingvo models. Our experimental
results show that our algorithm outperforms other state-of-the-art defense
algorithms both in terms of accuracy and signal quality.
Related papers
- High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified
Spoofing Detection [6.713879688002623]
Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks.
Current unified solutions struggle to detect spoofing artifacts.
We present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients.
arXiv Detail & Related papers (2023-09-18T14:54:42Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - A Deep-Bayesian Framework for Adaptive Speech Duration Modification [20.99099283004413]
We use a Bayesian framework to define a latent attention map that links frames of the input and target utterances.
We train a masked convolutional encoder-decoder network to produce this attention map via a version of the mean absolute error loss function.
We show that our technique results in a high quality of generated speech that is on par with state-of-the-art vocoders.
arXiv Detail & Related papers (2021-07-11T05:53:07Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Adversarial Robustness by Design through Analog Computing and Synthetic
Gradients [80.60080084042666]
We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor.
In the white-box setting, our defense works by obfuscating the parameters of the random projection.
We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks.
arXiv Detail & Related papers (2021-01-06T16:15:29Z) - Class-Conditional Defense GAN Against End-to-End Speech Attacks [82.21746840893658]
We propose a novel approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo.
Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal.
Our defense-GAN considerably outperforms conventional defense algorithms in terms of word error rate and sentence level recognition accuracy.
arXiv Detail & Related papers (2020-10-22T00:02:02Z) - Unsupervised Cross-Domain Speech-to-Speech Conversion with
Time-Frequency Consistency [14.062850439230111]
We propose a condition encouraging spectrogram consistency during the adversarial training procedure.
Our experimental results on the Librispeech corpus show that the model trained with the TF consistency provides a perceptually better quality of speech-to-speech conversion.
arXiv Detail & Related papers (2020-05-15T22:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.