Multimodal End-to-End Group Emotion Recognition using Cross-Modal
Attention
- URL: http://arxiv.org/abs/2111.05890v1
- Date: Wed, 10 Nov 2021 19:19:26 GMT
- Title: Multimodal End-to-End Group Emotion Recognition using Cross-Modal
Attention
- Authors: Lev Evtodienko
- Abstract summary: Classifying group-level emotions is a challenging task due to complexity of video.
Our model achieves best validation accuracy of 60.37% which is approximately 8.5% higher, than VGAF dataset baseline.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifying group-level emotions is a challenging task due to complexity of
video, in which not only visual, but also audio information should be taken
into consideration. Existing works on multimodal emotion recognition are using
bulky approach, where pretrained neural networks are used as a feature
extractors and then extracted features are being fused. However, this approach
does not consider attributes of multimodal data and feature extractors cannot
be fine-tuned for specific task which can be disadvantageous for overall model
accuracy. To this end, our impact is twofold: (i) we train model end-to-end,
which allows early layers of neural network to be adapted with taking into
account later, fusion layers, of two modalities; (ii) all layers of our model
was fine-tuned for downstream task of emotion recognition, so there were no
need to train neural networks from scratch. Our model achieves best validation
accuracy of 60.37% which is approximately 8.5% higher, than VGAF dataset
baseline and is competitive with existing works, audio and video modalities.
Related papers
- Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks [19.639533220155965]
This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection.
We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic.
arXiv Detail & Related papers (2023-11-18T02:44:33Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Sparse Interaction Additive Networks via Feature Interaction Detection
and Sparse Selection [10.191597755296163]
We develop a tractable selection algorithm to efficiently identify the necessary feature combinations.
Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from simple and interpretable models to fully connected neural networks.
arXiv Detail & Related papers (2022-09-19T19:57:17Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion
Recognition? [36.67937514793215]
Cross-modal attention is seen as an effective mechanism for multi-modal fusion.
We implement and compare a cross-attention and a self-attention model.
We compare the models using different modality combinations for a 7-class emotion classification task.
arXiv Detail & Related papers (2022-02-18T15:44:14Z) - CBIR using Pre-Trained Neural Networks [1.2130044156459308]
We use a pretrained Inception V3 model, and extract activation of its last fully connected layer, which forms a low dimensional representation of the image.
This feature matrix, is then divided into branches and separate feature extraction is done for each branch, to obtain multiple features flattened into a vector.
We achieved a training accuracy of 99.46% and validation accuracy of 84.56% for the same.
arXiv Detail & Related papers (2021-10-27T14:19:48Z) - Multimodal End-to-End Sparse Model for Emotion Recognition [40.71488291980002]
We develop a fully end-to-end model that connects the two phases and optimize them jointly.
We also restructure the current datasets to enable the fully end-to-end training.
Experimental results show that our fully end-to-end model significantly surpasses the current state-of-the-art models.
arXiv Detail & Related papers (2021-03-17T14:05:05Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.