Disentangled Variational Autoencoder for Emotion Recognition in
Conversations
- URL: http://arxiv.org/abs/2305.14071v1
- Date: Tue, 23 May 2023 13:50:06 GMT
- Title: Disentangled Variational Autoencoder for Emotion Recognition in
Conversations
- Authors: Kailai Yang, Tianlin Zhang, Sophia Ananiadou
- Abstract summary: We propose a VAD-disentangled Variational AutoEncoder (VAD-VAE) for Emotion Recognition in Conversations (ERC)
VAD-VAE disentangles three affect representations Valence-Arousal-Dominance (VAD) from the latent space.
Experiments show that VAD-VAE outperforms the state-of-the-art model on two datasets.
- Score: 14.92924920489251
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Emotion Recognition in Conversations (ERC), the emotions of target
utterances are closely dependent on their context. Therefore, existing works
train the model to generate the response of the target utterance, which aims to
recognise emotions leveraging contextual information. However, adjacent
response generation ignores long-range dependencies and provides limited
affective information in many cases. In addition, most ERC models learn a
unified distributed representation for each utterance, which lacks
interpretability and robustness. To address these issues, we propose a
VAD-disentangled Variational AutoEncoder (VAD-VAE), which first introduces a
target utterance reconstruction task based on Variational Autoencoder, then
disentangles three affect representations Valence-Arousal-Dominance (VAD) from
the latent space. We also enhance the disentangled representations by
introducing VAD supervision signals from a sentiment lexicon and minimising the
mutual information between VAD distributions. Experiments show that VAD-VAE
outperforms the state-of-the-art model on two datasets. Further analysis proves
the effectiveness of each proposed module and the quality of disentangled VAD
representations. The code is available at
https://github.com/SteveKGYang/VAD-VAE.
Related papers
- CR-VAE: Contrastive Regularization on Variational Autoencoders for
Preventing Posterior Collapse [1.0044057719679085]
The Variational Autoencoder (VAE) is known to suffer from the phenomenon of textitposterior collapse
We propose a novel solution, the Contrastive Regularization for Variational Autoencoders (CR-VAE)
arXiv Detail & Related papers (2023-09-06T13:05:42Z) - Interpretable Sentence Representation with Variational Autoencoders and
Attention [0.685316573653194]
We develop methods to enhance the interpretability of recent representation learning techniques in natural language processing (NLP)
We leverage Variational Autoencoders (VAEs) due to their efficiency in relating observations to latent generative factors.
We build two models with inductive bias to separate information in latent representations into understandable concepts without annotated data.
arXiv Detail & Related papers (2023-05-04T13:16:15Z) - Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning [74.48337375174297]
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain.
We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features.
DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
arXiv Detail & Related papers (2023-03-27T15:21:43Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Conditional Deep Hierarchical Variational Autoencoder for Voice
Conversion [5.538544897623972]
Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training.
This paper investigates how an increasing model expressiveness has benefits and impacts on the VAE-VC.
arXiv Detail & Related papers (2021-12-06T05:54:11Z) - Unsupervised Speech Enhancement using Dynamical Variational
Auto-Encoders [29.796695365217893]
Dynamical variational auto-encoders (DVAEs) are a class of deep generative models with latent variables.
We propose an unsupervised speech enhancement algorithm based on the most general form of DVAEs.
We derive a variational expectation-maximization algorithm to perform speech enhancement.
arXiv Detail & Related papers (2021-06-23T09:48:38Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - Hierarchical Variational Autoencoder for Visual Counterfactuals [79.86967775454316]
Conditional Variational Autos (VAE) are gathering significant attention as an Explainable Artificial Intelligence (XAI) tool.
In this paper we show how relaxing the effect of the posterior leads to successful counterfactuals.
We introduce VAEX an Hierarchical VAE designed for this approach that can visually audit a classifier in applications.
arXiv Detail & Related papers (2021-02-01T14:07:11Z) - Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition
with Source Localization [73.62550438861942]
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR)
In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance.
arXiv Detail & Related papers (2020-10-30T20:26:28Z) - DAM: Deliberation, Abandon and Memory Networks for Generating Detailed
and Non-repetitive Responses in Visual Dialogue [29.330198609132207]
We propose a novel generative decoding architecture to generate high-quality responses.
In this architecture, word generation is decomposed into a series of attention-based information selection steps.
The responses contain more detailed and non-repetitive descriptions while maintaining the semantic accuracy.
arXiv Detail & Related papers (2020-07-07T09:49:47Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.