Speaker and Posture Classification using Instantaneous Intraspeech
Breathing Features
- URL: http://arxiv.org/abs/2005.12230v1
- Date: Mon, 25 May 2020 17:00:26 GMT
- Title: Speaker and Posture Classification using Instantaneous Intraspeech
Breathing Features
- Authors: At{\i}l \.Ilerialkan, Alptekin Temizel, H\"useyin Hac{\i}habibo\u{g}lu
- Abstract summary: We propose a method for speaker and posture classification using intraspeech breathing sounds.
Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic features extracted from speech are widely used in problems such as
biometric speaker identification and first-person activity detection. However,
the use of speech for such purposes raises privacy issues as the content is
accessible to the processing party. In this work, we propose a method for
speaker and posture classification using intraspeech breathing sounds.
Instantaneous magnitude features are extracted using the Hilbert-Huang
transform (HHT) and fed into a CNN-GRU network for classification of recordings
from the open intraspeech breathing sound dataset, BreathBase, that we
collected for this study. Using intraspeech breathing sounds, 87% speaker
classification, and 98% posture classification accuracy were obtained.
Related papers
- RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification [8.90841350214225]
We introduce a dual-encoder architecture that facilitates the estimation of room parameters directly from speech utterances.
A contrastive loss encoder function is employed to embed the speech and the acoustic response jointly.
In the test phase, only the reverberant utterance is available, and its embedding is used for the task of room shape classification.
arXiv Detail & Related papers (2024-06-05T10:13:55Z) - Careful Whisper -- leveraging advances in automatic speech recognition
for robust and interpretable aphasia subtype classification [0.0]
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments.
By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts.
We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech.
arXiv Detail & Related papers (2023-08-02T15:53:59Z) - Adversarial Representation Learning for Robust Privacy Preservation in
Audio [11.409577482625053]
Sound event detection systems may inadvertently reveal sensitive information about users or their surroundings.
We propose a novel adversarial training method for learning representations of audio recordings.
The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method.
arXiv Detail & Related papers (2023-04-29T08:39:55Z) - Spectro-Temporal Deep Features for Disordered Speech Assessment and
Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed.
Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and
language Models for Intent Classification [81.80311855996584]
We propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model.
We achieve 90.86% and 99.07% accuracy on ATIS and Fluent speech corpus, respectively.
arXiv Detail & Related papers (2021-02-15T07:20:06Z) - Respiratory Distress Detection from Telephone Speech using Acoustic and
Prosodic Features [27.77184655808592]
This work summarizes our preliminary findings on automatic detection of respiratory distress using well-known acoustic and prosodic features.
Speech samples are collected from de-identified telemedicine phonecalls from a healthcare provider in Bangladesh.
We hypothesize that respiratory distress may alter speech features such as voice quality, speaking pattern, loudness, and speech-pause duration.
arXiv Detail & Related papers (2020-11-15T13:32:45Z) - Active Speakers in Context [88.22935329360618]
Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker.
This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons.
Our experiments show that a structured feature ensemble already benefits the active speaker detection performance.
arXiv Detail & Related papers (2020-05-20T01:14:23Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z) - Improving speaker discrimination of target speech extraction with
time-domain SpeakerBeam [100.95498268200777]
SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics.
SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures.
We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
arXiv Detail & Related papers (2020-01-23T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.