Multimodal End-to-End Sparse Model for Emotion Recognition
- URL: http://arxiv.org/abs/2103.09666v1
- Date: Wed, 17 Mar 2021 14:05:05 GMT
- Title: Multimodal End-to-End Sparse Model for Emotion Recognition
- Authors: Wenliang Dai, Samuel Cahyawijaya, Zihan Liu, Pascale Fung
- Abstract summary: We develop a fully end-to-end model that connects the two phases and optimize them jointly.
We also restructure the current datasets to enable the fully end-to-end training.
Experimental results show that our fully end-to-end model significantly surpasses the current state-of-the-art models.
- Score: 40.71488291980002
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing works on multimodal affective computing tasks, such as emotion
recognition, generally adopt a two-phase pipeline, first extracting feature
representations for each single modality with hand-crafted algorithms and then
performing end-to-end learning with the extracted features. However, the
extracted features are fixed and cannot be further fine-tuned on different
target tasks, and manually finding feature extraction algorithms does not
generalize or scale well to different tasks, which can lead to sub-optimal
performance. In this paper, we develop a fully end-to-end model that connects
the two phases and optimizes them jointly. In addition, we restructure the
current datasets to enable the fully end-to-end training. Furthermore, to
reduce the computational overhead brought by the end-to-end model, we introduce
a sparse cross-modal attention mechanism for the feature extraction.
Experimental results show that our fully end-to-end model significantly
surpasses the current state-of-the-art models based on the two-phase pipeline.
Moreover, by adding the sparse cross-modal attention, our model can maintain
performance with around half the computation in the feature extraction part.
Related papers
- Few-Shot Medical Image Segmentation with Large Kernel Attention [5.630842216128902]
We propose a few-shot medical segmentation model that acquire comprehensive feature representation capabilities.
Our model comprises four key modules: a dual-path feature extractor, an attention module, an adaptive prototype prediction module, and a multi-scale prediction fusion module.
The results demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-07-27T02:28:30Z) - Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization [14.606035444283984]
Current approaches focus on developing models that handle modality-incomplete inputs during inference.
We propose a robust universal model with modality reconstruction and model personalization.
Our method has been extensively validated on two brain tumor segmentation benchmarks.
arXiv Detail & Related papers (2024-06-04T06:07:24Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Multimodal End-to-End Group Emotion Recognition using Cross-Modal
Attention [0.0]
Classifying group-level emotions is a challenging task due to complexity of video.
Our model achieves best validation accuracy of 60.37% which is approximately 8.5% higher, than VGAF dataset baseline.
arXiv Detail & Related papers (2021-11-10T19:19:26Z) - Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium
Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views.
Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.