Dialog speech sentiment classification for imbalanced datasets
- URL: http://arxiv.org/abs/2109.07228v1
- Date: Wed, 15 Sep 2021 11:43:04 GMT
- Title: Dialog speech sentiment classification for imbalanced datasets
- Authors: Sergis Nicolaou, Lambros Mavrides, Georgina Tryfou, Kyriakos Tolias,
Konstantinos Panousis, Sotirios Chatzis, Sergios Theodoridis
- Abstract summary: In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection.
We propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.
- Score: 7.84604505907019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech is the most common way humans express their feelings, and sentiment
analysis is the use of tools such as natural language processing and
computational algorithms to identify the polarity of these feelings. Even
though this field has seen tremendous advancements in the last two decades, the
task of effectively detecting under represented sentiments in different kinds
of datasets is still a challenging task. In this paper, we use single and
bi-modal analysis of short dialog utterances and gain insights on the main
factors that aid in sentiment detection, particularly in the underrepresented
classes, in datasets with and without inherent sentiment component.
Furthermore, we propose an architecture which uses a learning rate scheduler
and different monitoring criteria and provides state-of-the-art results for the
SWITCHBOARD imbalanced sentiment dataset.
Related papers
- Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Speech and Text-Based Emotion Recognizer [0.9168634432094885]
We build a balanced corpus from publicly available datasets for speech emotion recognition.
Our best system, a multi-modal speech, and text-based model, provides a performance of UA(Unweighed Accuracy) + WA (Weighed Accuracy) of 157.57 compared to the baseline algorithm performance of 119.66.
arXiv Detail & Related papers (2023-12-10T05:17:39Z) - SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - Effect of Attention and Self-Supervised Speech Embeddings on
Non-Semantic Speech Tasks [3.570593982494095]
We look at speech emotion understanding as a perception task which is a more realistic setting.
We leverage ComParE rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion.
Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.
arXiv Detail & Related papers (2023-08-28T07:11:27Z) - Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous
Abstractions from Heterogeneous Affective Datasets [4.720033725720261]
We propose a training procedure that learns a shared latent representation for emotions.
Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability.
arXiv Detail & Related papers (2023-08-15T16:39:10Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - M2R2: Missing-Modality Robust emotion Recognition framework with
iterative data augmentation [6.962213869946514]
We propose Missing-Modality Robust emotion Recognition (M2R2), which trains emotion recognition model with iterative data augmentation by learned common representation.
Party Attentive Network (PANet) is designed to classify emotions, which tracks all the speakers' states and context.
arXiv Detail & Related papers (2022-05-05T09:16:31Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - BiERU: Bidirectional Emotional Recurrent Unit for Conversational
Sentiment Analysis [18.1320976106637]
The main difference between conversational sentiment analysis and single sentence sentiment analysis is the existence of context information.
Existing approaches employ complicated deep learning structures to distinguish different parties in a conversation and then model the context information.
We propose a fast, compact and parameter-efficient party-ignorant framework named bidirectional emotional recurrent unit for conversational sentiment analysis.
arXiv Detail & Related papers (2020-05-31T11:13:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.