Paralinguistic Privacy Protection at the Edge
- URL: http://arxiv.org/abs/2011.02930v2
- Date: Sat, 29 May 2021 20:32:21 GMT
- Title: Paralinguistic Privacy Protection at the Edge
- Authors: Ranya Aloufi, Hamed Haddadi, David Boyle
- Abstract summary: We introduce EDGY, a representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud.
Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ABX score or minimal performance penalties in learning linguistic representations from raw voice signals.
- Score: 5.349852254138085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice user interfaces and digital assistants are rapidly entering our lives
and becoming singular touch points spanning our devices. These always-on
services capture and transmit our audio data to powerful cloud services for
further processing and subsequent actions. Our voices and raw audio signals
collected through these devices contain a host of sensitive paralinguistic
information that is transmitted to service providers regardless of deliberate
or false triggers. As our emotional patterns and sensitive attributes like our
identity, gender, mental well-being, are easily inferred using deep acoustic
models, we encounter a new generation of privacy risks by using these services.
One approach to mitigate the risk of paralinguistic-based privacy breaches is
to exploit a combination of cloud-based processing with privacy-preserving,
on-device paralinguistic information learning and filtering before transmitting
voice data. In this paper we introduce EDGY, a configurable, lightweight,
disentangled representation learning framework that transforms and filters
high-dimensional voice data to identify and contain sensitive attributes at the
edge prior to offloading to the cloud. We evaluate EDGY's on-device performance
and explore optimization techniques, including model quantization and knowledge
distillation, to enable private, accurate and efficient representation learning
on resource-constrained devices. Our results show that EDGY runs in tens of
milliseconds with 0.2% relative improvement in ABX score or minimal performance
penalties in learning linguistic representations from raw voice signals, using
a CPU and a single-core ARM processor without specialized hardware.
Related papers
- Collaborative Inference over Wireless Channels with Feature Differential Privacy [57.68286389879283]
Collaborative inference among multiple wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications.
transmitting extracted features poses a significant privacy risk, as sensitive personal data can be exposed during the process.
We propose a novel privacy-preserving collaborative inference mechanism, wherein each edge device in the network secures the privacy of extracted features before transmitting them to a central server for inference.
arXiv Detail & Related papers (2024-10-25T18:11:02Z) - Speech Emotion Recognition under Resource Constraints with Data Distillation [64.36799373890916]
Speech emotion recognition (SER) plays a crucial role in human-computer interaction.
The emergence of edge devices in the Internet of Things presents challenges in constructing intricate deep learning models.
We propose a data distillation framework to facilitate efficient development of SER models in IoT applications.
arXiv Detail & Related papers (2024-06-21T13:10:46Z) - Using External Off-Policy Speech-To-Text Mappings in Contextual
End-To-End Automated Speech Recognition [19.489794740679024]
We investigate the potential of leveraging external knowledge, particularly through off-policy key-value stores generated with text-to-speech methods.
In our approach, audio embeddings captured from text-to-speech, along with semantic text embeddings, are used to bias ASR.
Experiments on LibiriSpeech and in-house voice assistant/search datasets show that the proposed approach can reduce domain adaptation time by up to 1K GPU-hours.
arXiv Detail & Related papers (2023-01-06T22:32:50Z) - Task-Oriented Communication for Edge Video Analytics [11.03999024164301]
This paper proposes a task-oriented communication framework for edge video analytics.
Multiple devices collect visual sensory data and transmit the informative features to an edge server for processing.
We show that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
arXiv Detail & Related papers (2022-11-25T12:09:12Z) - Privacy against Real-Time Speech Emotion Detection via Acoustic
Adversarial Evasion of Machine Learning [7.387631194438338]
DARE-GP is a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech.
Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment.
arXiv Detail & Related papers (2022-11-17T00:25:05Z) - Attribute Inference Attack of Speech Emotion Recognition in Federated
Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data.
We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters.
We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Configurable Privacy-Preserving Automatic Speech Recognition [5.730142956540673]
We investigate whether modular automatic speech recognition can improve privacy in voice assistive systems.
We show privacy concerns and the effects of applying various state-of-the-art techniques to each stage of the system.
We argue this presents new opportunities for privacy-preserving applications incorporating ASR.
arXiv Detail & Related papers (2021-04-01T21:03:49Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.