Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density
Estimation with Non-speech Audio
- URL: http://arxiv.org/abs/2309.10280v2
- Date: Wed, 20 Sep 2023 23:45:05 GMT
- Title: Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density
Estimation with Non-speech Audio
- Authors: Forsad Al Hossain, Tanjid Hasan Tonmoy, Andrew A. Lover, George A.
Corey, Mohammad Arif Ul Alam, Tauhidur Rahman
- Abstract summary: We propose a non-speech audio-based approach for crowd analytics.
Non-speech audio alone can be used to conduct such analysis with remarkable accuracy.
- Score: 4.149485024539117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy-preserving crowd density analysis finds application across a wide
range of scenarios, substantially enhancing smart building operation and
management while upholding privacy expectations in various spaces. We propose a
non-speech audio-based approach for crowd analytics, leveraging a
transformer-based model. Our results demonstrate that non-speech audio alone
can be used to conduct such analysis with remarkable accuracy. To the best of
our knowledge, this is the first time when non-speech audio signals are
proposed for predicting occupancy. As far as we know, there has been no other
similar approach of its kind prior to this. To accomplish this, we deployed our
sensor-based platform in the waiting room of a large hospital with IRB approval
over a period of several months to capture non-speech audio and thermal images
for the training and evaluation of our models. The proposed non-speech-based
approach outperformed the thermal camera-based model and all other baselines.
In addition to demonstrating superior performance without utilizing speech
audio, we conduct further analysis using differential privacy techniques to
provide additional privacy guarantees. Overall, our work demonstrates the
viability of employing non-speech audio data for accurate occupancy estimation,
while also ensuring the exclusion of speech-related content and providing
robust privacy protections through differential privacy guarantees.
Related papers
- SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within.
Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples.
In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z) - REWIND Dataset: Privacy-preserving Speaking Status Segmentation from
Multimodal Body Movement Signals in the Wild [14.5263556841263]
We present the first publicly available multimodal dataset with high-quality individual speech recordings of 33 subjects in a professional networking event.
In all cases we predict a 20Hz binary speaking status signal extracted from the audio, a time resolution not available in previous datasets.
arXiv Detail & Related papers (2024-03-02T15:14:58Z) - AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework.
It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z) - Long-term Conversation Analysis: Exploring Utility and Privacy [12.380029887841175]
We explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient.
We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy.
arXiv Detail & Related papers (2023-06-28T10:10:57Z) - Adversarial Representation Learning for Robust Privacy Preservation in
Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings.
We propose a novel adversarial training method for learning representations of audio recordings.
The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z) - LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders [53.30016986953206]
We propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture.
We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model's ability to adapt to different levels of background noise and speech interference.
arXiv Detail & Related papers (2022-11-20T15:27:55Z) - Anonymizing Speech with Generative Adversarial Networks to Preserve
Speaker Privacy [22.84840887071428]
Speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.
This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications.
We propose to tackle this issue by generating speaker embeddings using a generative adversarial network with Wasserstein distance as cost function.
arXiv Detail & Related papers (2022-10-13T13:12:42Z) - An Experimental Study on Private Aggregation of Teacher Ensemble
Learning for End-to-End Speech Recognition [51.232523987916636]
Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data.
In this work, we extend PATE learning to work with dynamic patterns, namely speech, and perform one very first experimental study on ASR to avoid acoustic data leakage.
arXiv Detail & Related papers (2022-10-11T16:55:54Z) - Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
by Re-Synthesis [67.73554826428762]
We propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR.
Our approach leverages audio-visual speech cues to generate the codes of a neural speech, enabling efficient synthesis of clean, realistic speech from noisy signals.
arXiv Detail & Related papers (2022-03-31T17:57:10Z) - Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.