NeuraGen-A Low-Resource Neural Network based approach for Gender
Classification
- URL: http://arxiv.org/abs/2203.15253v1
- Date: Tue, 29 Mar 2022 05:57:24 GMT
- Title: NeuraGen-A Low-Resource Neural Network based approach for Gender
Classification
- Authors: Shankhanil Ghosh (1), Chhanda Saha (1) and Naagamani Molakathaala (1)
((1) School of Computer and Information Sciences, University of Hyderabad,
Hyderabad, India)
- Abstract summary: We have used speech recordings collected from the ELSDSR and limited TIMIT datasets.
We extracted 8 speech features, which were pre-processed and then fed into NeuraGen to identify the gender.
NeuraGen has successfully achieved accuracy of 90.7407% and F1 score of 91.227% in train and 20-fold cross validation dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human voice is the source of several important information. This is in the
form of features. These Features help in interpreting various features
associated with the speaker and speech. The speaker dependent work
researchersare targeted towards speaker identification, Speaker verification,
speaker biometric, forensics using feature, and cross-modal matching via speech
and face images. In such context research, it is a very difficult task to come
across clean, and well annotated publicly available speech corpus as data set.
Acquiring volunteers to generate such dataset is also very expensive, not to
mention the enormous amount of effort and time researchers spend to gather such
data. The present paper work, a Neural Network proposal as NeuraGen focused
which is a low-resource ANN architecture. The proposed tool used to classify
gender of the speaker from the speech recordings. We have used speech
recordings collected from the ELSDSR and limited TIMIT datasets, from which we
extracted 8 speech features, which were pre-processed and then fed into
NeuraGen to identify the gender. NeuraGen has successfully achieved accuracy of
90.7407% and F1 score of 91.227% in train and 20-fold cross validation dataset.
Related papers
- Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network [0.0]
This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal datasets.
Our approach achieves a validation accuracy of 97.87%, leveraging data augmentation techniques to handle background noise and limited training samples.
arXiv Detail & Related papers (2024-11-22T17:18:08Z) - Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech [0.5330251011543498]
We used our dataset of more than 45 hours of audio data from 633 individual male and female speakers.
We recorded the highest accuracy of 85.44%.
arXiv Detail & Related papers (2024-04-18T10:17:20Z) - Deepfake audio as a data augmentation technique for training automatic
speech to text transcription models [55.2480439325792]
We propose a framework that approaches data augmentation based on deepfake audio.
A dataset produced by Indians (in English) was selected, ensuring the presence of a single accent.
arXiv Detail & Related papers (2023-09-22T11:33:03Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - Faked Speech Detection with Zero Prior Knowledge [2.407976495888858]
We introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked.
We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers.
We were able to get at least 94% correct classification of the test cases, as against the 85% accuracy in the case of human observers.
arXiv Detail & Related papers (2022-09-26T10:38:39Z) - Overlapped speech and gender detection with WavLM pre-trained features [6.054285771277486]
This article focuses on overlapped speech and gender detection in order to study interactions between women and men in French audiovisual media.
We propose to use WavLM model which has the advantage of being pre-trained on a huge amount of speech data.
A neural GD is trained with WavLM inputs on a gender balanced subset of the French broadcast news ALLIES data, and obtains an accuracy of 97.9%.
arXiv Detail & Related papers (2022-09-09T08:00:47Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z) - Retrieving Speaker Information from Personalized Acoustic Models for
Speech Recognition [5.1229352884025845]
We show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.
In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.
arXiv Detail & Related papers (2021-11-07T22:17:52Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.