Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme
Learning Machine with a New Weighting Scheme and Spectro-Temporal Features
Along with Classical Feature Selection and A New Quantum-Inspired Dimension
Reduction Method
- URL: http://arxiv.org/abs/2111.07094v1
- Date: Sat, 13 Nov 2021 11:09:38 GMT
- Title: Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme
Learning Machine with a New Weighting Scheme and Spectro-Temporal Features
Along with Classical Feature Selection and A New Quantum-Inspired Dimension
Reduction Method
- Authors: Fatemeh Daneshfar, Seyed Jahanshah Kabudian
- Abstract summary: A system for speech emotion recognition (SER) based on speech signal is proposed.
The system consists of three stages: feature extraction, feature selection, and finally feature classification.
A new weighting method has also been proposed to deal with class imbalance, which is more efficient than existing weighting methods.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Affective computing is very important in the relationship between man and
machine. In this paper, a system for speech emotion recognition (SER) based on
speech signal is proposed, which uses new techniques in different stages of
processing. The system consists of three stages: feature extraction, feature
selection, and finally feature classification. In the first stage, a complex
set of long-term statistics features is extracted from both the speech signal
and the glottal-waveform signal using a combination of new and diverse features
such as prosodic, spectral, and spectro-temporal features. One of the
challenges of the SER systems is to distinguish correlated emotions. These
features are good discriminators for speech emotions and increase the SER's
ability to recognize similar and different emotions. This feature vector with a
large number of dimensions naturally has redundancy. In the second stage, using
classical feature selection techniques as well as a new quantum-inspired
technique to reduce the feature vector dimensionality, the number of feature
vector dimensions is reduced. In the third stage, the optimized feature vector
is classified by a weighted deep sparse extreme learning machine (ELM)
classifier. The classifier performs classification in three steps: sparse
random feature learning, orthogonal random projection using the singular value
decomposition (SVD) technique, and discriminative classification in the last
step using the generalized Tikhonov regularization technique. Also, many
existing emotional datasets suffer from the problem of data imbalanced
distribution, which in turn increases the classification error and decreases
system performance. In this paper, a new weighting method has also been
proposed to deal with class imbalance, which is more efficient than existing
weighting methods. The proposed method is evaluated on three standard emotional
databases.
Related papers
- Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Training speech emotion classifier without categorical annotations [1.5609988622100528]
The main aim of this study is to investigate the relation between these two representations.
The proposed approach contains a regressor model which is trained to predict a vector of continuous values in dimensional representation for given speech audio.
The output of this model can be interpreted as an emotional category using a mapping algorithm.
arXiv Detail & Related papers (2022-10-14T08:47:41Z) - Selecting and combining complementary feature representations and
classifiers for hate speech detection [6.745479230590518]
Hate speech is a major issue in social networks due to the high volume of data generated daily.
Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language.
This work argues that a combination of multiple feature extraction techniques and different classification models is needed.
arXiv Detail & Related papers (2022-01-18T03:46:49Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Unsupervised low-rank representations for speech emotion recognition [78.38221758430244]
We examine the use of linear and non-linear dimensionality reduction algorithms for extracting low-rank feature representations for speech emotion recognition.
We report speech emotion recognition (SER) results for learned representations on two databases using different classification methods.
arXiv Detail & Related papers (2021-04-14T18:30:58Z) - Does a Hybrid Neural Network based Feature Selection Model Improve Text
Classification? [9.23545668304066]
We propose a hybrid feature selection method for obtaining relevant features.
We then present three ways of implementing a feature selection and neural network pipeline.
We also observed a slight increase in accuracy on some datasets.
arXiv Detail & Related papers (2021-01-22T09:12:19Z) - Optimizing Speech Emotion Recognition using Manta-Ray Based Feature
Selection [1.4502611532302039]
We show that concatenation of features, extracted by using different existing feature extraction methods can boost the classification accuracy.
We also perform a novel application of Manta Ray optimization in speech emotion recognition tasks that resulted in a state-of-the-art result.
arXiv Detail & Related papers (2020-09-18T16:09:34Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - A Novel Community Detection Based Genetic Algorithm for Feature
Selection [3.8848561367220276]
Authors propose a genetic algorithm based on community detection, which functions in three steps.
Nine benchmark classification problems were analyzed in terms of the performance of the presented approach.
arXiv Detail & Related papers (2020-08-08T15:39:30Z) - Learning Class Regularized Features for Action Recognition [68.90994813947405]
We introduce a novel method named Class Regularization that performs class-based regularization of layer activations.
We show that using Class Regularization blocks in state-of-the-art CNN architectures for action recognition leads to systematic improvement gains of 1.8%, 1.2% and 1.4% on the Kinetics, UCF-101 and HMDB-51 datasets, respectively.
arXiv Detail & Related papers (2020-02-07T07:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.