SkillFence: A Systems Approach to Practically Mitigating Voice-Based
Confusion Attacks
- URL: http://arxiv.org/abs/2212.08738v1
- Date: Fri, 16 Dec 2022 22:22:04 GMT
- Title: SkillFence: A Systems Approach to Practically Mitigating Voice-Based
Confusion Attacks
- Authors: Ashish Hooda, Matthew Wallace, Kushal Jhunjhunwalla, Earlence
Fernandes, Kassem Fawaz
- Abstract summary: Recent work has shown that commercial systems like Amazon Alexa and Google Home are vulnerable to voice-based confusion attacks.
We propose a systems-oriented defense against this class of attacks and demonstrate its functionality for Amazon Alexa.
We build SkilIFence, a browser extension that existing voice assistant users can install to ensure that only legitimate skills run in response to their commands.
- Score: 9.203566746598439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Voice assistants are deployed widely and provide useful functionality.
However, recent work has shown that commercial systems like Amazon Alexa and
Google Home are vulnerable to voice-based confusion attacks that exploit design
issues. We propose a systems-oriented defense against this class of attacks and
demonstrate its functionality for Amazon Alexa. We ensure that only the skills
a user intends execute in response to voice commands. Our key insight is that
we can interpret a user's intentions by analyzing their activity on counterpart
systems of the web and smartphones. For example, the Lyft ride-sharing Alexa
skill has an Android app and a website. Our work shows how information from
counterpart apps can help reduce dis-ambiguities in the skill invocation
process. We build SkilIFence, a browser extension that existing voice assistant
users can install to ensure that only legitimate skills run in response to
their commands. Using real user data from MTurk (N = 116) and experimental
trials involving synthetic and organic speech, we show that SkillFence provides
a balance between usability and security by securing 90.83% of skills that a
user will need with a False acceptance rate of 19.83%.
Related papers
- Distilling an End-to-End Voice Assistant Without Instruction Training Data [53.524071162124464]
Distilled Voice Assistant (DiVA) generalizes to Question Answering, Classification, and Translation.
We show that DiVA better meets user preferences, achieving a 72% win rate compared with state-of-the-art models like Qwen 2 Audio.
arXiv Detail & Related papers (2024-10-03T17:04:48Z) - SkillScanner: Detecting Policy-Violating Voice Applications Through Static Analysis at the Development Phase [24.084878589421113]
Amazon Alexa has implemented a set of policy requirements to be adhered to by third-party skill developers.
Recent works reveal the prevalence of policy-violating skills in the current skills store.
arXiv Detail & Related papers (2023-09-11T23:22:34Z) - Defend Data Poisoning Attacks on Voice Authentication [6.160281428772401]
Machine learning attacks are putting voice authentication systems at risk.
We propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator.
Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
arXiv Detail & Related papers (2022-09-09T22:48:35Z) - Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual
Imitation Learning [62.83590925557013]
We learn a set of challenging partially-observed manipulation tasks from visual and audio inputs.
Our proposed system learns these tasks by combining offline imitation learning from tele-operated demonstrations and online finetuning.
In a set of simulated tasks, we find that our system benefits from using audio, and that by using online interventions we are able to improve the success rate of offline imitation learning by 20%.
arXiv Detail & Related papers (2022-05-30T04:52:58Z) - The MIT Voice Name System [53.473846742702854]
We aim to standardize voice interactions to a universal reach similar to that of other systems such as phone numbering.
We focus on voice as a starting point to talk to any IoT object.
Privacy and security are key elements considered because of speech-to-text errors and the amount of personal information contained in a voice sample.
arXiv Detail & Related papers (2022-03-28T19:09:26Z) - ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement
Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem.
This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse.
We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z) - Two-stage Voice Application Recommender System for Unhandled Utterances
in Intelligent Personal Assistant [5.475452673163167]
We propose a two-stage shortlister-reranker recommender system to match third-party voice applications to unhandled utterances.
We show how to build a new system by using observed data collected from a baseline rule-based system.
We present online A/B testing results that show a significant boost on user experience satisfaction.
arXiv Detail & Related papers (2021-10-19T11:52:56Z) - "Alexa, what do you do for fun?" Characterizing playful requests with
virtual assistants [68.48043257125678]
We introduce a taxonomy of playful requests rooted in theories of humor and refined by analyzing real-world traffic from Alexa.
Our conjecture is that understanding such utterances will improve user experience with virtual assistants.
arXiv Detail & Related papers (2021-05-12T10:48:00Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Detecting Distrust Towards the Skills of a Virtual Assistant Using
Speech [8.992916975952477]
We study the feasibility of automatically detecting the level of trust that a user has on a virtual assistant (VA) based on their speech.
We find that the subject's speech can be used to detect which type of VA they were using, which could be considered a proxy for the user's trust toward the VA's abilities.
arXiv Detail & Related papers (2020-07-30T19:56:17Z) - Learning to Rank Intents in Voice Assistants [2.102846336724103]
We propose a novel Energy-based model for the intent ranking task.
We show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%.
We also evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.
arXiv Detail & Related papers (2020-04-30T21:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.