A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection
- URL: http://arxiv.org/abs/2407.00188v1
- Date: Fri, 28 Jun 2024 18:55:07 GMT
- Title: A Novel Labeled Human Voice Signal Dataset for Misbehavior Detection
- Authors: Ali Raza, Faizan Younas,
- Abstract summary: This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition.
It contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals.
- Score: 0.7223352886780369
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Voice signal classification based on human behaviours involves analyzing various aspects of speech patterns and delivery styles. In this study, a real-time dataset collection is performed where participants are instructed to speak twelve psychology questions in two distinct manners: first, in a harsh voice, which is categorized as "misbehaved"; and second, in a polite manner, categorized as "normal". These classifications are crucial in understanding how different vocal behaviours affect the interpretation and classification of voice signals. This research highlights the significance of voice tone and delivery in automated machine-learning systems for voice analysis and recognition. This research contributes to the broader field of voice signal analysis by elucidating the impact of human behaviour on the perception and categorization of voice signals, thereby enhancing the development of more accurate and context-aware voice recognition technologies.
Related papers
- Feature Representations for Automatic Meerkat Vocalization Classification [15.642602544201308]
This paper investigates feature representations for automatic meerkat vocalization analysis.
Call type classification studies conducted on two data sets reveal that feature extraction methods developed for human speech processing can be effectively employed for automatic meerkat call analysis.
arXiv Detail & Related papers (2024-08-27T10:51:51Z) - Evaluating Speaker Identity Coding in Self-supervised Models and Humans [0.42303492200814446]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications.
We show that self-supervised representations from different families are significantly better for speaker identification over acoustic representations.
We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks.
arXiv Detail & Related papers (2024-06-14T20:07:21Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Integration of Text and Graph-based Features for Detecting Mental Health
Disorders from Voice [1.5469452301122175]
Two methods are used to enrich voice analysis for depression detection.
Results suggest that integration of text-based voice classification and learning from low level and graph-based voice signal features can improve the detection of mental disorders like depression.
arXiv Detail & Related papers (2022-05-14T08:37:19Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Measuring the Impact of Individual Domain Factors in Self-Supervised
Pre-Training [60.825471653739555]
We show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important.
This is the first study to better understand the domain characteristics of pre-trained sets in self-supervised pre-training for speech.
arXiv Detail & Related papers (2022-03-01T17:40:51Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Data-driven Detection and Analysis of the Patterns of Creaky Voice [13.829936505895692]
Creaky voice is a quality frequently used as a phrase-boundary marker.
The automatic detection and modelling of creaky voice may have implications for speech technology applications.
arXiv Detail & Related papers (2020-05-31T13:34:30Z) - On the Mutual Information between Source and Filter Contributions for
Voice Pathology Detection [11.481208551940998]
This paper addresses the problem of automatic detection of voice pathologies directly from the speech signal.
Three sets of features are proposed, depending on whether they are related to the speech or the glottal signal, or to prosody.
arXiv Detail & Related papers (2020-01-02T10:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.