Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional
Emotion Recognition
- URL: http://arxiv.org/abs/2004.13236v1
- Date: Tue, 28 Apr 2020 01:25:00 GMT
- Title: Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional
Emotion Recognition
- Authors: Dung Nguyen, Duc Thanh Nguyen, Rui Zeng, Thanh Thi Nguyen, Son N.
Tran, Thin Nguyen, Sridha Sridharan, and Clinton Fookes
- Abstract summary: We propose a novel deep neural network architecture consisting of a two-stream auto-encoder and a long short term memory for emotion recognition.
We carry out extensive experiments on the multimodal emotion in the wild dataset: RECOLA.
Experimental results show that the proposed method achieves state-of-the-art recognition performance and surpasses existing schemes by a significant margin.
- Score: 38.350188118975616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal dimensional emotion recognition has drawn a great attention from
the affective computing community and numerous schemes have been extensively
investigated, making a significant progress in this area. However, several
questions still remain unanswered for most of existing approaches including:
(i) how to simultaneously learn compact yet representative features from
multimodal data, (ii) how to effectively capture complementary features from
multimodal streams, and (iii) how to perform all the tasks in an end-to-end
manner. To address these challenges, in this paper, we propose a novel deep
neural network architecture consisting of a two-stream auto-encoder and a long
short term memory for effectively integrating visual and audio signal streams
for emotion recognition. To validate the robustness of our proposed
architecture, we carry out extensive experiments on the multimodal emotion in
the wild dataset: RECOLA. Experimental results show that the proposed method
achieves state-of-the-art recognition performance and surpasses existing
schemes by a significant margin.
Related papers
- EEG-based Multimodal Representation Learning for Emotion Recognition [26.257531037300325]
We introduce a novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data.
Our framework is designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities.
arXiv Detail & Related papers (2024-10-29T01:35:17Z) - Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition [53.359383163184425]
We introduce a novel multimodality synergistic knowledge distillation scheme tailored for efficient single-eye motion recognition tasks.
This method allows a lightweight, unimodal student spiking neural network (SNN) to extract rich knowledge from an event-frame multimodal teacher network.
arXiv Detail & Related papers (2024-06-20T07:24:47Z) - A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions [0.0]
We propose a multi-task, multi-modal system that predicts categorical and dimensional emotions.
Results emphasise the importance of cross-regularisation between the two types of emotions.
arXiv Detail & Related papers (2023-12-31T16:48:03Z) - Multimodal deep representation learning for quantum cross-platform
verification [60.01590250213637]
Cross-platform verification, a critical undertaking in the realm of early-stage quantum computing, endeavors to characterize the similarity of two imperfect quantum devices executing identical algorithms.
We introduce an innovative multimodal learning approach, recognizing that the formalism of data in this task embodies two distinct modalities.
We devise a multimodal neural network to independently extract knowledge from these modalities, followed by a fusion operation to create a comprehensive data representation.
arXiv Detail & Related papers (2023-11-07T04:35:03Z) - EMERSK -- Explainable Multimodal Emotion Recognition with Situational
Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK)
EMERSK is a general system for human emotion recognition and explanation using visual information.
Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z) - FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video
Emotion Recognition Inference [6.279057784373124]
In this paper, we design a fully multimodal video-to-emotion system (FV2ES) for fast yet effective recognition inference.
The adoption of the hierarchical attention method upon the sound spectra breaks through the limited contribution of the acoustic modality.
The further integration of data pre-processing into the aligned multimodal learning model allows the significant reduction of computational costs and storage space.
arXiv Detail & Related papers (2022-09-21T08:05:26Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - Deep Multimodal Neural Architecture Search [178.35131768344246]
We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
arXiv Detail & Related papers (2020-04-25T07:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.