Stop Bugging Me! Evading Modern-Day Wiretapping Using Adversarial
Perturbations
- URL: http://arxiv.org/abs/2010.12809v2
- Date: Thu, 2 Sep 2021 07:54:52 GMT
- Title: Stop Bugging Me! Evading Modern-Day Wiretapping Using Adversarial
Perturbations
- Authors: Yael Mathov and Tal Ben Senior and Asaf Shabtai and Yuval Elovici
- Abstract summary: Mass surveillance systems for voice over IP (VoIP) conversations pose a great risk to privacy.
We present an adversarial-learning-based framework for privacy protection for VoIP conversations.
- Score: 47.32228513808444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mass surveillance systems for voice over IP (VoIP) conversations pose a great
risk to privacy. These automated systems use learning models to analyze
conversations, and calls that involve specific topics are routed to a human
agent for further examination. In this study, we present an
adversarial-learning-based framework for privacy protection for VoIP
conversations. We present a novel method that finds a universal adversarial
perturbation (UAP), which, when added to the audio stream, prevents an
eavesdropper from automatically detecting the conversation's topic. As shown in
our experiments, the UAP is agnostic to the speaker or audio length, and its
volume can be changed in real time, as needed. Our real-world solution uses a
Teensy microcontroller that acts as an external microphone and adds the UAP to
the audio in real time. We examine different speakers, VoIP applications
(Skype, Zoom, Slack, and Google Meet), and audio lengths. Our results in the
real world suggest that our approach is a feasible solution for privacy
protection.
Related papers
- Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - Anonymizing Speech: Evaluating and Designing Speaker Anonymization
Techniques [1.2691047660244337]
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.
This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization.
arXiv Detail & Related papers (2023-08-05T16:14:17Z) - CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual
Navigation in Noisy Environments [41.21509045214965]
CAVEN is a framework in which the agent may interact with a human/oracle for solving the task of navigating to an audio goal.
Our results show that our fully-conversational approach leads to nearly an order-of-magnitude improvement in success rate.
arXiv Detail & Related papers (2023-06-06T22:32:49Z) - AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
Head [82.69233563811487]
Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition.
We propose a multi-modal AI system named AudioGPT, which complements LLMs with foundation models to process complex audio information.
arXiv Detail & Related papers (2023-04-25T17:05:38Z) - Privacy against Real-Time Speech Emotion Detection via Acoustic
Adversarial Evasion of Machine Learning [7.387631194438338]
DARE-GP is a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech.
Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment.
arXiv Detail & Related papers (2022-11-17T00:25:05Z) - Real-time Caller Intent Detection In Human-Human Customer Support Spoken
Conversations [10.312382727352823]
Agent assistance during human-human customer support spoken interactions requires triggering based on the caller's intent (reason for call)
The goal is for a system to detect the caller's intent at the time the agent would have been able to detect it (Intent Boundary)
Recent work on voice assistants has used incremental real-time predictions at a word-by-word level to detect intent before the end of a command.
arXiv Detail & Related papers (2022-08-14T07:50:23Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Configurable Privacy-Preserving Automatic Speech Recognition [5.730142956540673]
We investigate whether modular automatic speech recognition can improve privacy in voice assistive systems.
We show privacy concerns and the effects of applying various state-of-the-art techniques to each stage of the system.
We argue this presents new opportunities for privacy-preserving applications incorporating ASR.
arXiv Detail & Related papers (2021-04-01T21:03:49Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Self-Supervised Learning of Audio-Visual Objects from Video [108.77341357556668]
We introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time.
We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks.
arXiv Detail & Related papers (2020-08-10T16:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.