Representation Learning for Audio Privacy Preservation using Source
Separation and Robust Adversarial Learning
- URL: http://arxiv.org/abs/2308.04960v1
- Date: Wed, 9 Aug 2023 13:50:00 GMT
- Title: Representation Learning for Audio Privacy Preservation using Source
Separation and Robust Adversarial Learning
- Authors: Diep Luong, Minh Tran, Shayan Gharib, Konstantinos Drossos, Tuomas
Virtanen
- Abstract summary: We propose the integration of two commonly used approaches in privacy preservation: source separation and adversarial representation learning.
The proposed system learns the latent representation of audio recordings such that it prevents differentiating between speech and non-speech recordings.
- Score: 16.1694012177079
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Privacy preservation has long been a concern in smart acoustic monitoring
systems, where speech can be passively recorded along with a target signal in
the system's operating environment. In this study, we propose the integration
of two commonly used approaches in privacy preservation: source separation and
adversarial representation learning. The proposed system learns the latent
representation of audio recordings such that it prevents differentiating
between speech and non-speech recordings. Initially, the source separation
network filters out some of the privacy-sensitive data, and during the
adversarial learning process, the system will learn privacy-preserving
representation on the filtered signal. We demonstrate the effectiveness of our
proposed method by comparing our method against systems without source
separation, without adversarial learning, and without both. Overall, our
results suggest that the proposed system can significantly improve speech
privacy preservation compared to that of using source separation or adversarial
learning solely while maintaining good performance in the acoustic monitoring
task.
Related papers
- Adversarial Representation Learning for Robust Privacy Preservation in
Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings.
We propose a novel adversarial training method for learning representations of audio recordings.
The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z) - Language-Guided Audio-Visual Source Separation via Trimodal Consistency [64.0580750128049]
A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform.
We adapt off-the-shelf vision-language foundation models to provide pseudo-target supervision via two novel loss functions.
We demonstrate the effectiveness of our self-supervised approach on three audio-visual separation datasets.
arXiv Detail & Related papers (2023-03-28T22:45:40Z) - Speech Privacy Leakage from Shared Gradients in Distributed Learning [7.8470002970302195]
We explore methods for recovering private speech/speaker information from the shared gradients in distributed learning settings.
We demonstrate the feasibility of inferring various levels of side-channel information, including speech content and speaker identity, under the distributed learning framework.
arXiv Detail & Related papers (2023-02-21T04:48:29Z) - SPADE: Self-supervised Pretraining for Acoustic DisEntanglement [2.294014185517203]
We introduce a self-supervised approach to disentangle room acoustics from speech.
Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce.
arXiv Detail & Related papers (2023-02-03T01:36:38Z) - An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling
to Differential Privacy Preserving Speech Recognition [51.20130423303659]
We propose an ensemble learning framework with Poisson sub-sampling to train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.
Through boosting under DP, a student model derived from the training data suffers little model degradation from the models trained with no privacy protection.
Our proposed solution leverages upon two mechanisms, namely: (i) a privacy budget amplification via Poisson sub-sampling to train a target prediction model that requires less noise to achieve a same level of privacy budget, and (ii) a combination of the sub-sampling technique and an ensemble teacher-student learning framework.
arXiv Detail & Related papers (2022-10-12T16:34:08Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - Configurable Privacy-Preserving Automatic Speech Recognition [5.730142956540673]
We investigate whether modular automatic speech recognition can improve privacy in voice assistive systems.
We show privacy concerns and the effects of applying various state-of-the-art techniques to each stage of the system.
We argue this presents new opportunities for privacy-preserving applications incorporating ASR.
arXiv Detail & Related papers (2021-04-01T21:03:49Z) - Robust Audio-Visual Instance Discrimination [79.74625434659443]
We present a self-supervised learning method to learn audio and video representations.
We address the problems of audio-visual instance discrimination and improve transfer learning performance.
arXiv Detail & Related papers (2021-03-29T19:52:29Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Self-Supervised Learning of Audio-Visual Objects from Video [108.77341357556668]
We introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time.
We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks.
arXiv Detail & Related papers (2020-08-10T16:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.