SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems
- URL: http://arxiv.org/abs/2007.06622v3
- Date: Tue, 21 Jul 2020 17:42:56 GMT
- Title: SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems
- Authors: Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot,
and Patrick Traynor
- Abstract summary: We show that the end-to-end architecture of speech and speaker systems makes attacks and defenses against them substantially different than those in the image space.
We then demonstrate experimentally that attacks against these models almost universally fail to transfer.
- Score: 28.635467696564703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech and speaker recognition systems are employed in a variety of
applications, from personal assistants to telephony surveillance and biometric
authentication. The wide deployment of these systems has been made possible by
the improved accuracy in neural networks. Like other systems based on neural
networks, recent research has demonstrated that speech and speaker recognition
systems are vulnerable to attacks using manipulated inputs. However, as we
demonstrate in this paper, the end-to-end architecture of speech and speaker
systems and the nature of their inputs make attacks and defenses against them
substantially different than those in the image space. We demonstrate this
first by systematizing existing research in this space and providing a taxonomy
through which the community can evaluate future work. We then demonstrate
experimentally that attacks against these models almost universally fail to
transfer. In so doing, we argue that substantial additional work is required to
provide adequate mitigations in this space.
Related papers
- Vulnerabilities in Machine Learning-Based Voice Disorder Detection Systems [3.4745231630177136]
We explore the possibility of attacks that can reverse classification and compromise their reliability.
Given the critical nature of personal health information, understanding which types of attacks are effective is a necessary first step toward improving the security of such systems.
Our findings identify the most effective attack strategies, underscoring the need to address these vulnerabilities in machine-learning systems used in the healthcare domain.
arXiv Detail & Related papers (2024-10-21T10:14:44Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - When Authentication Is Not Enough: On the Security of Behavioral-Based Driver Authentication Systems [53.2306792009435]
We develop two lightweight driver authentication systems based on Random Forest and Recurrent Neural Network architectures.
We are the first to propose attacks against these systems by developing two novel evasion attacks, SMARTCAN and GANCAN.
Through our contributions, we aid practitioners in safely adopting these systems, help reduce car thefts, and enhance driver security.
arXiv Detail & Related papers (2023-06-09T14:33:26Z) - Tubes Among Us: Analog Attack on Automatic Speaker Identification [37.42266692664095]
We show that a human is capable of producing analog adversarial examples directly with little cost and supervision.
Our findings extend to a range of other acoustic-biometric tasks such as liveness detection, bringing into question their use in security-critical settings in real life.
arXiv Detail & Related papers (2022-02-06T10:33:13Z) - Bias in Automated Speaker Recognition [0.0]
We study bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition.
We show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge.
Most affected are female speakers and non-US nationalities, who experience significant performance degradation.
arXiv Detail & Related papers (2022-01-24T06:48:57Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - Texture-based Presentation Attack Detection for Automatic Speaker
Verification [21.357976330739245]
This paper reports our exploration of texture descriptors applied to the analysis of speech spectrogram images.
In particular, we propose a common fisher vector feature space based on a generative model.
At most, 16 in 100 bona fide presentations are rejected whereas only one in 100 attack presentations are accepted.
arXiv Detail & Related papers (2020-10-08T15:03:29Z) - Adversarial Attack and Defense Strategies for Deep Speaker Recognition
Systems [44.305353565981015]
This paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures.
Experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%.
arXiv Detail & Related papers (2020-08-18T00:58:19Z) - Integrated Replay Spoofing-aware Text-independent Speaker Verification [47.41124427552161]
We propose two approaches for building an integrated system of speaker verification and presentation attack detection.
The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning.
We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
arXiv Detail & Related papers (2020-06-10T01:24:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.