Private Speech Classification with Secure Multiparty Computation
- URL: http://arxiv.org/abs/2007.00253v2
- Date: Thu, 28 Jan 2021 20:18:43 GMT
- Title: Private Speech Classification with Secure Multiparty Computation
- Authors: Kyle Bittner, Martine De Cock, Rafael Dowsley
- Abstract summary: We propose the first privacy-preserving solution for deep learning-based audio classification that is provably secure.
Our approach allows to classify a speech signal of one party with a deep neural network of another party without Bob ever seeing Alice's speech signal in an unencrypted manner.
We evaluate the efficiency-security-accuracy trade-off of the proposed solution in a use case for privacy-preserving emotion detection from speech with a convolutional neural network.
- Score: 15.065527713259542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning in audio signal processing, such as human voice audio signal
classification, is a rich application area of machine learning. Legitimate use
cases include voice authentication, gunfire detection, and emotion recognition.
While there are clear advantages to automated human speech classification,
application developers can gain knowledge beyond the professed scope from
unprotected audio signal processing. In this paper we propose the first
privacy-preserving solution for deep learning-based audio classification that
is provably secure. Our approach, which is based on Secure Multiparty
Computation, allows to classify a speech signal of one party (Alice) with a
deep neural network of another party (Bob) without Bob ever seeing Alice's
speech signal in an unencrypted manner. As threat models, we consider both
passive security, i.e. with semi-honest parties who follow the instructions of
the cryptographic protocols, as well as active security, i.e. with malicious
parties who deviate from the protocols. We evaluate the
efficiency-security-accuracy trade-off of the proposed solution in a use case
for privacy-preserving emotion detection from speech with a convolutional
neural network. In the semi-honest case we can classify a speech signal in
under 0.3 sec; in the malicious case it takes $\sim$1.6 sec. In both cases
there is no leakage of information, and we achieve classification accuracies
that are the same as when computations are done on unencrypted data.
Related papers
- SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within.
Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples.
In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z) - Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition [4.164975438207411]
In recent years, the typical backdoor attacks have been researched in speech recognition systems.
The attacker adds some incorporated changes to benign speech spectrograms or changes the speech components, such as pitch and timbre.
To improve the stealthiness of data poisoning, we propose a non-neural and fast algorithm called Random Spectrogram Rhythm Transformation.
arXiv Detail & Related papers (2024-06-16T13:29:21Z) - Faked Speech Detection with Zero Prior Knowledge [2.407976495888858]
We introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked.
We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers.
We were able to get at least 94% correct classification of the test cases, as against the 85% accuracy in the case of human observers.
arXiv Detail & Related papers (2022-09-26T10:38:39Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Self-supervised Learning with Random-projection Quantizer for Speech
Recognition [51.24368930992091]
We present a simple and effective self-supervised learning approach for speech recognition.
The approach learns a model to predict masked speech signals, in the form of discrete labels.
It achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models.
arXiv Detail & Related papers (2022-02-03T21:29:04Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z) - Protecting gender and identity with disentangled speech representations [49.00162808063399]
We show that protecting gender information in speech is more effective than modelling speaker-identity information.
We present a novel way to encode gender information and disentangle two sensitive biometric identifiers.
arXiv Detail & Related papers (2021-04-22T13:31:41Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.