It's LeVAsa not LevioSA! Latent Encodings for Valence-Arousal Structure
Alignment
- URL: http://arxiv.org/abs/2007.10058v3
- Date: Mon, 30 Nov 2020 18:24:01 GMT
- Title: It's LeVAsa not LevioSA! Latent Encodings for Valence-Arousal Structure
Alignment
- Authors: Surabhi S. Nath, Vishaal Udandarao, Jainendra Shukla
- Abstract summary: We present "LeVAsa", a VAE model that learns implicit structure by aligning the latent space with the VA space.
Our results reveal that LeVAsa achieves high latent-circumplex alignment which leads to improved downstream categorical emotion prediction.
- Score: 3.6513059119482154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, great strides have been made in the field of affective
computing. Several models have been developed to represent and quantify
emotions. Two popular ones include (i) categorical models which represent
emotions as discrete labels, and (ii) dimensional models which represent
emotions in a Valence-Arousal (VA) circumplex domain. However, there is no
standard for annotation mapping between the two labelling methods. We build a
novel algorithm for mapping categorical and dimensional model labels using
annotation transfer across affective facial image datasets. Further, we utilize
the transferred annotations to learn rich and interpretable data
representations using a variational autoencoder (VAE). We present "LeVAsa", a
VAE model that learns implicit structure by aligning the latent space with the
VA space. We evaluate the efficacy of LeVAsa by comparing performance with the
Vanilla VAE using quantitative and qualitative analysis on two benchmark
affective image datasets. Our results reveal that LeVAsa achieves high
latent-circumplex alignment which leads to improved downstream categorical
emotion prediction. The work also demonstrates the trade-off between degree of
alignment and quality of reconstructions.
Related papers
- VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection [50.57849622045192]
We propose VAEmo, an efficient framework for emotion-centric joint VA representation learning with external knowledge injection.<n>VAEmo achieves state-of-the-art performance with a compact design, highlighting the benefit of unified cross-modal encoding and emotion-aware semantic guidance.
arXiv Detail & Related papers (2025-05-05T03:00:51Z) - Generalized Visual Relation Detection with Diffusion Models [94.62313788626128]
Visual relation detection (VRD) aims to identify relationships (or interactions) between object pairs in an image.
We propose to model visual relations as continuous embeddings, and design diffusion models to achieve generalized VRD in a conditional generative manner.
Our Diff-VRD is able to generate visual relations beyond the pre-defined category labels of datasets.
arXiv Detail & Related papers (2025-04-16T14:03:24Z) - eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos [7.011656298079659]
The prevailing use of short videos (SVs) leads to the necessity of emotion recognition in SVs.
Considering the lack of SVs emotion data, we introduce a large-scale dataset named eMotions, comprising 27,996 videos.
We present an end-to-end baseline method AV-CPNet that employs the video transformer to better learn semantically relevant representations.
arXiv Detail & Related papers (2023-11-29T03:24:30Z) - Disentangled Variational Autoencoder for Emotion Recognition in
Conversations [14.92924920489251]
We propose a VAD-disentangled Variational AutoEncoder (VAD-VAE) for Emotion Recognition in Conversations (ERC)
VAD-VAE disentangles three affect representations Valence-Arousal-Dominance (VAD) from the latent space.
Experiments show that VAD-VAE outperforms the state-of-the-art model on two datasets.
arXiv Detail & Related papers (2023-05-23T13:50:06Z) - Interpretable Sentence Representation with Variational Autoencoders and
Attention [0.685316573653194]
We develop methods to enhance the interpretability of recent representation learning techniques in natural language processing (NLP)
We leverage Variational Autoencoders (VAEs) due to their efficiency in relating observations to latent generative factors.
We build two models with inductive bias to separate information in latent representations into understandable concepts without annotated data.
arXiv Detail & Related papers (2023-05-04T13:16:15Z) - Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot
Learning [74.48337375174297]
Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain.
We deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between prototypes and visual features.
DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one.
arXiv Detail & Related papers (2023-03-27T15:21:43Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Trading Information between Latents in Hierarchical Variational
Autoencoders [8.122270502556374]
Variational Autoencoders (VAEs) were originally motivated as probabilistic generative models in which one performs approximate Bayesian inference.
The proposal of $beta$-VAEs breaks this interpretation and generalizes VAEs to application domains beyond generative modeling.
We identify a general class of inference models for which one can split the rate into contributions from each layer, which can then be tuned independently.
arXiv Detail & Related papers (2023-02-09T18:56:11Z) - Improving VAE-based Representation Learning [26.47244578124654]
We study what properties are required for good representations and how different VAE structure choices could affect the learned properties.
We show that by using a decoder that prefers to learn local features, the remaining global features can be well captured by the latent.
arXiv Detail & Related papers (2022-05-28T23:00:18Z) - A Comprehensive Study of Image Classification Model Sensitivity to
Foregrounds, Backgrounds, and Visual Attributes [58.633364000258645]
We call this dataset RIVAL10 consisting of roughly $26k$ instances over $10$ classes.
We evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes.
In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training)
arXiv Detail & Related papers (2022-01-26T06:31:28Z) - Multivariate Data Explanation by Jumping Emerging Patterns Visualization [78.6363825307044]
We present VAX (multiVariate dAta eXplanation), a new VA method to support the identification and visual interpretation of patterns in multivariate data sets.
Unlike the existing similar approaches, VAX uses the concept of Jumping Emerging Patterns to identify and aggregate several diversified patterns, producing explanations through logic combinations of data variables.
arXiv Detail & Related papers (2021-06-21T13:49:44Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - i-Mix: A Domain-Agnostic Strategy for Contrastive Representation
Learning [117.63815437385321]
We propose i-Mix, a simple yet effective domain-agnostic regularization strategy for improving contrastive representation learning.
In experiments, we demonstrate that i-Mix consistently improves the quality of learned representations across domains.
arXiv Detail & Related papers (2020-10-17T23:32:26Z) - Depthwise Discrete Representation Learning [2.728575246952532]
Recent advancements in learning Discrete Representations have led to state of art results in tasks that involve Language, Audio and Vision.
Some latent factors such as words, phonemes and shapes are better represented by discrete latent variables as opposed to continuous.
Vector Quantized Variational Autoencoders (VQVAE) have produced remarkable results in multiple domains.
arXiv Detail & Related papers (2020-04-11T18:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.