Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems
- URL: http://arxiv.org/abs/2112.01821v1
- Date: Fri, 3 Dec 2021 10:21:47 GMT
- Title: Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems
- Authors: Xiaoliang Wu, Ajitha Rajan
- Abstract summary: Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances.
Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations.
To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
- Score: 1.599072005190786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic speech recognition (ASR) systems are prevalent, particularly in
applications for voice navigation and voice control of domestic appliances. The
computational core of ASRs are deep neural networks (DNNs) that have been shown
to be susceptible to adversarial perturbations; easily misused by attackers to
generate malicious outputs. To help test the correctness of ASRS, we propose
techniques that automatically generate blackbox (agnostic to the DNN),
untargeted adversarial attacks that are portable across ASRs. Much of the
existing work on adversarial ASR testing focuses on targeted attacks, i.e
generating audio samples given an output text. Targeted techniques are not
portable, customised to the structure of DNNs (whitebox) within a specific ASR.
In contrast, our method attacks the signal processing stage of the ASR pipeline
that is shared across most ASRs. Additionally, we ensure the generated
adversarial audio samples have no human audible difference by manipulating the
acoustic signal using a psychoacoustic model that maintains the signal below
the thresholds of human perception. We evaluate portability and effectiveness
of our techniques using three popular ASRs and three input audio datasets using
the metrics - WER of output text, Similarity to original audio and attack
Success Rate on different ASRs. We found our testing techniques were portable
across ASRs, with the adversarial audio samples producing high Success Rates,
WERs and Similarities to the original audio.
Related papers
- SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition [23.9811164130045]
We propose a channel-aware data simulation method for robust automatic speech recognition training.
Our method harnesses the synergistic power of channel-extractive techniques and generative adversarial networks (GANs)
We evaluate our method on the challenging Hakka Across Taiwan (HAT) and Taiwanese Across Taiwan (TAT) corpora, achieving relative character error rate (CER) reductions of 20.02% and 9.64%, respectively.
arXiv Detail & Related papers (2024-09-19T01:02:31Z) - ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features [25.28307679567351]
ALIF is the first black-box adversarial linguistic feature-based attack pipeline.
We present ALIF-OTL and ALIF-OTA schemes for launching attacks in both the digital domain and the physical playback environment.
arXiv Detail & Related papers (2024-08-03T15:30:16Z) - VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment [101.2489492032816]
VALL-E R is a robust and efficient zero-shot Text-to-Speech system.
This research has the potential to be applied to meaningful projects, including the creation of speech for those affected by aphasia.
arXiv Detail & Related papers (2024-06-12T04:09:44Z) - Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer [8.948537516293328]
We propose an attack on Automatic Speech Recognition (ASR) systems based on user-customized style transfer.
Our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks.
arXiv Detail & Related papers (2024-05-15T16:05:24Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Robustifying automatic speech recognition by extracting slowly varying features [16.74051650034954]
We propose a defense mechanism against targeted adversarial attacks.
We use hybrid ASR models trained on data pre-processed in such a way.
Our model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
arXiv Detail & Related papers (2021-12-14T13:50:23Z) - Speech Pattern based Black-box Model Watermarking for Automatic Speech
Recognition [83.2274907780273]
How to design a black-box watermarking scheme for automatic speech recognition models is still an unsolved problem.
We propose the first black-box model watermarking framework for protecting the IP of ASR models.
Experiments on the state-of-the-art open-source ASR system DeepSpeech demonstrate the feasibility of the proposed watermarking scheme.
arXiv Detail & Related papers (2021-10-19T09:01:41Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Detecting Adversarial Examples for Speech Recognition via Uncertainty
Quantification [21.582072216282725]
Machine learning systems and, specifically, automatic speech recognition (ASR) systems are vulnerable to adversarial attacks.
In this paper, we focus on hybrid ASR systems and compare four acoustic models regarding their ability to indicate uncertainty under attack.
We are able to detect adversarial examples with an area under the receiving operator curve score of more than 0.99.
arXiv Detail & Related papers (2020-05-24T19:31:02Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.