SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems
- URL: http://arxiv.org/abs/2007.06622v3
- Date: Tue, 21 Jul 2020 17:42:56 GMT
- Title: SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems
- Authors: Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot,
and Patrick Traynor
- Abstract summary: We show that the end-to-end architecture of speech and speaker systems makes attacks and defenses against them substantially different than those in the image space.
We then demonstrate experimentally that attacks against these models almost universally fail to transfer.
- Score: 28.635467696564703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech and speaker recognition systems are employed in a variety of
applications, from personal assistants to telephony surveillance and biometric
authentication. The wide deployment of these systems has been made possible by
the improved accuracy in neural networks. Like other systems based on neural
networks, recent research has demonstrated that speech and speaker recognition
systems are vulnerable to attacks using manipulated inputs. However, as we
demonstrate in this paper, the end-to-end architecture of speech and speaker
systems and the nature of their inputs make attacks and defenses against them
substantially different than those in the image space. We demonstrate this
first by systematizing existing research in this space and providing a taxonomy
through which the community can evaluate future work. We then demonstrate
experimentally that attacks against these models almost universally fail to
transfer. In so doing, we argue that substantial additional work is required to
provide adequate mitigations in this space.
Related papers
- Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - When Authentication Is Not Enough: On the Security of Behavioral-Based Driver Authentication Systems [53.2306792009435]
We develop two lightweight driver authentication systems based on Random Forest and Recurrent Neural Network architectures.
We are the first to propose attacks against these systems by developing two novel evasion attacks, SMARTCAN and GANCAN.
Through our contributions, we aid practitioners in safely adopting these systems, help reduce car thefts, and enhance driver security.
arXiv Detail & Related papers (2023-06-09T14:33:26Z) - Tubes Among Us: Analog Attack on Automatic Speaker Identification [37.42266692664095]
We show that a human is capable of producing analog adversarial examples directly with little cost and supervision.
Our findings extend to a range of other acoustic-biometric tasks such as liveness detection, bringing into question their use in security-critical settings in real life.
arXiv Detail & Related papers (2022-02-06T10:33:13Z) - Bias in Automated Speaker Recognition [0.0]
We study bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition.
We show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge.
Most affected are female speakers and non-US nationalities, who experience significant performance degradation.
arXiv Detail & Related papers (2022-01-24T06:48:57Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Texture-based Presentation Attack Detection for Automatic Speaker
Verification [21.357976330739245]
This paper reports our exploration of texture descriptors applied to the analysis of speech spectrogram images.
In particular, we propose a common fisher vector feature space based on a generative model.
At most, 16 in 100 bona fide presentations are rejected whereas only one in 100 attack presentations are accepted.
arXiv Detail & Related papers (2020-10-08T15:03:29Z) - Adversarial Attack and Defense Strategies for Deep Speaker Recognition
Systems [44.305353565981015]
This paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures.
Experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%.
arXiv Detail & Related papers (2020-08-18T00:58:19Z) - Integrated Replay Spoofing-aware Text-independent Speaker Verification [47.41124427552161]
We propose two approaches for building an integrated system of speaker verification and presentation attack detection.
The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning.
We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
arXiv Detail & Related papers (2020-06-10T01:24:55Z) - Detecting Adversarial Examples for Speech Recognition via Uncertainty
Quantification [21.582072216282725]
Machine learning systems and, specifically, automatic speech recognition (ASR) systems are vulnerable to adversarial attacks.
In this paper, we focus on hybrid ASR systems and compare four acoustic models regarding their ability to indicate uncertainty under attack.
We are able to detect adversarial examples with an area under the receiving operator curve score of more than 0.99.
arXiv Detail & Related papers (2020-05-24T19:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.