Evaluating Synthetic Command Attacks on Smart Voice Assistants
- URL: http://arxiv.org/abs/2411.08316v2
- Date: Thu, 14 Nov 2024 19:30:50 GMT
- Title: Evaluating Synthetic Command Attacks on Smart Voice Assistants
- Authors: Zhengxian He, Ashish Kundu, Mustaque Ahamad,
- Abstract summary: We show that even simple concatenative speech synthesis can be used by an attacker to command voice assistants to perform sensitive operations.
Our results demonstrate the need for better defenses against synthetic malicious commands that could target voice assistants.
- Score: 2.91784559412979
- License:
- Abstract: Recent advances in voice synthesis, coupled with the ease with which speech can be harvested for millions of people, introduce new threats to applications that are enabled by devices such as voice assistants (e.g., Amazon Alexa, Google Home etc.). We explore if unrelated and limited amount of speech from a target can be used to synthesize commands for a voice assistant like Amazon Alexa. More specifically, we investigate attacks on voice assistants with synthetic commands when they match command sources to authorized users, and applications (e.g., Alexa Skills) process commands only when their source is an authorized user with a chosen confidence level. We demonstrate that even simple concatenative speech synthesis can be used by an attacker to command voice assistants to perform sensitive operations. We also show that such attacks, when launched by exploiting compromised devices in the vicinity of voice assistants, can have relatively small host and network footprint. Our results demonstrate the need for better defenses against synthetic malicious commands that could target voice assistants.
Related papers
- Distilling an End-to-End Voice Assistant Without Instruction Training Data [53.524071162124464]
Distilled Voice Assistant (DiVA) generalizes to Question Answering, Classification, and Translation.
We show that DiVA better meets user preferences, achieving a 72% win rate compared with state-of-the-art models like Qwen 2 Audio.
arXiv Detail & Related papers (2024-10-03T17:04:48Z) - Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System [73.34663391495616]
We propose a pioneering approach to tackle joint multi-talker and target-talker speech recognition tasks.
Specifically, we freeze Whisper and plug a Sidecar separator into its encoder to separate mixed embedding for multiple talkers.
We deliver acceptable zero-shot performance on multi-talker ASR on AishellMix Mandarin dataset.
arXiv Detail & Related papers (2024-07-13T09:28:24Z) - Follow-on Question Suggestion via Voice Hints for Voice Assistants [29.531005346608215]
We tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions.
We propose baselines and an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions.
Results show that a naive approach of concatenating suggested questions creates poor voice hints.
arXiv Detail & Related papers (2023-10-25T22:22:18Z) - Rewriting the Script: Adapting Text Instructions for Voice Interaction [39.54213483588498]
We study the limitations of the dominant approach voice assistants take to complex task guidance.
We propose eight ways in which voice assistants can transform written sources into forms that are readily communicated through spoken conversation.
arXiv Detail & Related papers (2023-06-16T17:43:00Z) - SkillFence: A Systems Approach to Practically Mitigating Voice-Based
Confusion Attacks [9.203566746598439]
Recent work has shown that commercial systems like Amazon Alexa and Google Home are vulnerable to voice-based confusion attacks.
We propose a systems-oriented defense against this class of attacks and demonstrate its functionality for Amazon Alexa.
We build SkilIFence, a browser extension that existing voice assistant users can install to ensure that only legitimate skills run in response to their commands.
arXiv Detail & Related papers (2022-12-16T22:22:04Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - The MIT Voice Name System [53.473846742702854]
We aim to standardize voice interactions to a universal reach similar to that of other systems such as phone numbering.
We focus on voice as a starting point to talk to any IoT object.
Privacy and security are key elements considered because of speech-to-text errors and the amount of personal information contained in a voice sample.
arXiv Detail & Related papers (2022-03-28T19:09:26Z) - Robust Sensor Fusion Algorithms Against VoiceCommand Attacks in
Autonomous Vehicles [8.35945218644081]
We propose a novel multimodal deep learning classification system to defend against inaudible command attacks.
Our experimental results confirm the feasibility of the proposed defense methods and the best classification accuracy reaches 89.2%.
arXiv Detail & Related papers (2021-04-20T10:08:46Z) - Audio Adversarial Examples: Attacks Using Vocal Masks [0.0]
We construct audio adversarial examples on automatic Speech-To-Text systems.
We produce an another by overlaying an audio vocal mask generated from the original audio.
We apply our audio adversarial attack to five SOTA STT systems: DeepSpeech, Julius, Kaldi, wav2letter@anywhere and CMUSphinx.
arXiv Detail & Related papers (2021-02-04T05:21:10Z) - Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio.
We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.