Related papers: SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems

SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems

URL: http://arxiv.org/abs/2007.06622v3
Date: Tue, 21 Jul 2020 17:42:56 GMT
Title: SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems
Authors: Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot, and Patrick Traynor
Abstract summary: We show that the end-to-end architecture of speech and speaker systems makes attacks and defenses against them substantially different than those in the image space. We then demonstrate experimentally that attacks against these models almost universally fail to transfer.
Score: 28.635467696564703
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulnerable to attacks using manipulated inputs. However, as we demonstrate in this paper, the end-to-end architecture of speech and speaker systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space. We demonstrate this first by systematizing existing research in this space and providing a taxonomy through which the community can evaluate future work. We then demonstrate experimentally that attacks against these models almost universally fail to transfer. In so doing, we argue that substantial additional work is required to provide adequate mitigations in this space.

Related papers

Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks [59.87470192277124]
This paper explores methods of compromising speech translation systems through imperceptible audio manipulations. We present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation. Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly. The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems.
arXiv Detail & Related papers (2025-03-02T16:38:16Z)
Vulnerabilities in Machine Learning-Based Voice Disorder Detection Systems [3.4745231630177136]
We explore the possibility of attacks that can reverse classification and compromise their reliability. Given the critical nature of personal health information, understanding which types of attacks are effective is a necessary first step toward improving the security of such systems. Our findings identify the most effective attack strategies, underscoring the need to address these vulnerabilities in machine-learning systems used in the healthcare domain.
arXiv Detail & Related papers (2024-10-21T10:14:44Z)
Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems. We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems. We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z)
Certification of Speaker Recognition Models to Additive Perturbations [4.332441337407564]
robustness of speaker recognition systems against adversarial attacks remains a significant challenge. We pioneer applying robustness certification techniques to speaker recognition, initially developed for the image domain.
arXiv Detail & Related papers (2024-04-29T15:23:26Z)
When Authentication Is Not Enough: On the Security of Behavioral-Based Driver Authentication Systems [53.2306792009435]
We develop two lightweight driver authentication systems based on Random Forest and Recurrent Neural Network architectures. We are the first to propose attacks against these systems by developing two novel evasion attacks, SMARTCAN and GANCAN. Through our contributions, we aid practitioners in safely adopting these systems, help reduce car thefts, and enhance driver security.
arXiv Detail & Related papers (2023-06-09T14:33:26Z)
Tubes Among Us: Analog Attack on Automatic Speaker Identification [37.42266692664095]
We show that a human is capable of producing analog adversarial examples directly with little cost and supervision. Our findings extend to a range of other acoustic-biometric tasks such as liveness detection, bringing into question their use in security-critical settings in real life.
arXiv Detail & Related papers (2022-02-06T10:33:13Z)
Bias in Automated Speaker Recognition [0.0]
We study bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. We show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge. Most affected are female speakers and non-US nationalities, who experience significant performance degradation.
arXiv Detail & Related papers (2022-01-24T06:48:57Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Instant One-Shot Word-Learning for Context-Specific Neural Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z)
Texture-based Presentation Attack Detection for Automatic Speaker Verification [21.357976330739245]
This paper reports our exploration of texture descriptors applied to the analysis of speech spectrogram images. In particular, we propose a common fisher vector feature space based on a generative model. At most, 16 in 100 bona fide presentations are rejected whereas only one in 100 attack presentations are accepted.
arXiv Detail & Related papers (2020-10-08T15:03:29Z)
Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems [44.305353565981015]
This paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures. Experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%.
arXiv Detail & Related papers (2020-08-18T00:58:19Z)
Integrated Replay Spoofing-aware Text-independent Speaker Verification [47.41124427552161]
We propose two approaches for building an integrated system of speaker verification and presentation attack detection. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning. We propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection.
arXiv Detail & Related papers (2020-06-10T01:24:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.