SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
- URL: http://arxiv.org/abs/2306.00794v1
- Date: Thu, 1 Jun 2023 15:25:14 GMT
- Title: SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
- Authors: Mirazul Haque, Rutvij Shah, Simin Chen, Berrak \c{S}i\c{s}man, Cong
Liu, Wei Yang
- Abstract summary: In this work, we propose SlothSpeech, a denial-of-service attack against automatic speech recognition models.
We find that SlothSpeech generated inputs can increase the latency up to 40X times the latency induced by benign input.
- Score: 6.984028236389121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Learning (DL) models have been popular nowadays to execute different
speech-related tasks, including automatic speech recognition (ASR). As ASR is
being used in different real-time scenarios, it is important that the ASR model
remains efficient against minor perturbations to the input. Hence, evaluating
efficiency robustness of the ASR model is the need of the hour. We show that
popular ASR models like Speech2Text model and Whisper model have dynamic
computation based on different inputs, causing dynamic efficiency. In this
work, we propose SlothSpeech, a denial-of-service attack against ASR models,
which exploits the dynamic behaviour of the model. SlothSpeech uses the
probability distribution of the output text tokens to generate perturbations to
the audio such that efficiency of the ASR model is decreased. We find that
SlothSpeech generated inputs can increase the latency up to 40X times the
latency induced by benign input.
Related papers
- Unified End-to-End Speech Recognition and Endpointing for Fast and
Efficient Speech Systems [17.160006765475988]
We propose a method to jointly train the ASR and EP tasks in a single end-to-end (E2E) model.
We introduce a "switch" connection, which trains the EP to consume either the audio frames directly or low-level latent representations from the ASR model.
This results in a single E2E model that can be used during inference to perform frame filtering at low cost.
arXiv Detail & Related papers (2022-11-01T23:43:15Z) - Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings [53.11450530896623]
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize "who spoke what"
Our model is based on token-level serialized output training (t-SOT) which was recently proposed to transcribe multi-talker speech in a streaming fashion.
The proposed model achieves substantially better accuracy than a prior streaming model and shows comparable or sometimes even superior results to the state-of-the-art offline SA-ASR model.
arXiv Detail & Related papers (2022-03-30T21:42:00Z) - Robustifying automatic speech recognition by extracting slowly varying features [16.74051650034954]
We propose a defense mechanism against targeted adversarial attacks.
We use hybrid ASR models trained on data pre-processed in such a way.
Our model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
arXiv Detail & Related papers (2021-12-14T13:50:23Z) - Speech Pattern based Black-box Model Watermarking for Automatic Speech
Recognition [83.2274907780273]
How to design a black-box watermarking scheme for automatic speech recognition models is still an unsolved problem.
We propose the first black-box model watermarking framework for protecting the IP of ASR models.
Experiments on the state-of-the-art open-source ASR system DeepSpeech demonstrate the feasibility of the proposed watermarking scheme.
arXiv Detail & Related papers (2021-10-19T09:01:41Z) - Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models [57.20432226304683]
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
We propose a novel end-to-end streaming NAR speech recognition system.
We show that the proposed method improves online ASR recognition in low latency conditions.
arXiv Detail & Related papers (2021-07-20T11:42:26Z) - Data Augmentation for Training Dialog Models Robust to Speech
Recognition Errors [5.53506103787497]
Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users' audio signals to text data through automatic speech recognition (ASR)
The ASR output is error-prone; however, the downstream dialog models are often trained on error-free text data, making them sensitive to ASR errors during inference time.
We leverage an ASR error simulator to inject noise into the error-free text data, and subsequently train the dialog models with the augmented data.
arXiv Detail & Related papers (2020-06-10T03:18:15Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z) - Streaming automatic speech recognition with the transformer model [59.58318952000571]
We propose a transformer based end-to-end ASR system for streaming ASR.
We apply time-restricted self-attention for the encoder and triggered attention for the encoder-decoder attention mechanism.
Our proposed streaming transformer architecture achieves 2.8% and 7.2% WER for the "clean" and "other" test data of LibriSpeech.
arXiv Detail & Related papers (2020-01-08T18:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.