Leveraging Domain Features for Detecting Adversarial Attacks Against
Deep Speech Recognition in Noise
- URL: http://arxiv.org/abs/2211.01621v1
- Date: Thu, 3 Nov 2022 07:25:45 GMT
- Title: Leveraging Domain Features for Detecting Adversarial Attacks Against
Deep Speech Recognition in Noise
- Authors: Christian Heider Nielsen and Zheng-Hua Tan
- Abstract summary: adversarial attacks against deep ASR systems are highly successful.
This work leverages filter bank-based features to better capture the characteristics of attacks for improved detection.
Inverse filter bank features generally perform better in both clean and noisy environments.
- Score: 18.19207291891767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, significant progress has been made in deep model-based
automatic speech recognition (ASR), leading to its widespread deployment in the
real world. At the same time, adversarial attacks against deep ASR systems are
highly successful. Various methods have been proposed to defend ASR systems
from these attacks. However, existing classification based methods focus on the
design of deep learning models while lacking exploration of domain specific
features. This work leverages filter bank-based features to better capture the
characteristics of attacks for improved detection. Furthermore, the paper
analyses the potentials of using speech and non-speech parts separately in
detecting adversarial attacks. In the end, considering adverse environments
where ASR systems may be deployed, we study the impact of acoustic noise of
various types and signal-to-noise ratios. Extensive experiments show that the
inverse filter bank features generally perform better in both clean and noisy
environments, the detection is effective using either speech or non-speech
part, and the acoustic noise can largely degrade the detection performance.
Related papers
- Revisiting Acoustic Features for Robust ASR [25.687120601256787]
We revisit the approach of earlier works that developed acoustic features inspired by biological auditory perception.
We propose two new acoustic features called frequency masked spectrogram (FreqMask) and difference of gammatones spectrogram (DoGSpec) to simulate the neuro-psychological phenomena of frequency masking and lateral suppression.
arXiv Detail & Related papers (2024-09-24T18:58:23Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - ConvNext Based Neural Network for Anti-Spoofing [6.047242590232868]
Automatic speaker verification (ASV) has been widely used in the real life for identity authentication.
With the rapid development of speech conversion, speech algorithms and the improvement of the quality of recording devices, ASV systems are vulnerable for spoof attacks.
arXiv Detail & Related papers (2022-09-14T05:53:37Z) - AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
Recognition Systems [15.013763364096638]
Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks.
We present AS2T, the first attack in this domain which covers all the settings.
We study the possible distortions occurred in over-the-air transmission, utilize different transformation functions with different parameters to model those distortions, and incorporate them into the generation of adversarial voices.
arXiv Detail & Related papers (2022-06-07T14:38:55Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Robustifying automatic speech recognition by extracting slowly varying features [16.74051650034954]
We propose a defense mechanism against targeted adversarial attacks.
We use hybrid ASR models trained on data pre-processed in such a way.
Our model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
arXiv Detail & Related papers (2021-12-14T13:50:23Z) - Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space.
We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space.
Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z) - Identifying Audio Adversarial Examples via Anomalous Pattern Detection [4.556497931273283]
We show that 2 of the recent and current state-of-the-art adversarial attacks on audio processing systems lead to higher-than-expected activation at some subset of nodes.
We can detect these attacks with up to an AUC of 0.98 with no degradation in performance on benign samples.
arXiv Detail & Related papers (2020-02-13T12:08:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.