Multimodal Personality Recognition using Cross-Attention Transformer and
Behaviour Encoding
- URL: http://arxiv.org/abs/2112.12180v1
- Date: Wed, 22 Dec 2021 19:14:55 GMT
- Title: Multimodal Personality Recognition using Cross-Attention Transformer and
Behaviour Encoding
- Authors: Tanay Agrawal, Dhruv Agarwal, Michal Balazia, Neelabh Sinha, Francois
Bremond
- Abstract summary: We propose a flexible model for the task which exploits all available data.
The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personality computing and affective computing have gained recent interest in
many research areas. The datasets for the task generally have multiple
modalities like video, audio, language and bio-signals. In this paper, we
propose a flexible model for the task which exploits all available data. The
task involves complex relations and to avoid using a large model for video
processing specifically, we propose the use of behaviour encoding which boosts
performance with minimal change to the model. Cross-attention using
transformers has become popular in recent times and is utilised for fusion of
different modalities. Since long term relations may exist, breaking the input
into chunks is not desirable, thus the proposed model processes the entire
input together. Our experiments show the importance of each of the above
contributions
Related papers
- Language Supervised Human Action Recognition with Salient Fusion: Construction Worker Action Recognition as a Use Case [8.26451988845854]
We introduce a novel approach to Human Action Recognition (HAR) based on skeleton and visual cues.
We employ learnable prompts for the language model conditioned on the skeleton modality to optimize feature representation.
We introduce a new dataset tailored for real-world robotic applications in construction sites, featuring visual, skeleton, and depth data modalities.
arXiv Detail & Related papers (2024-10-02T19:10:23Z) - Sampling Foundational Transformer: A Theoretical Perspective [12.7600763629179]
We propose Foundational Sampling Transformer (SFT) that can work on multiple data modalities.
SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.
arXiv Detail & Related papers (2024-08-11T16:53:09Z) - Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities [56.666806962214565]
We propose to improve transformers of a specific modality with irrelevant data from other modalities.
We use an auxiliary transformer trained with data of another modality and construct pathways to connect components of the two models.
We observe significant and consistent performance improvements with irrelevant data from other modalities.
arXiv Detail & Related papers (2024-01-25T18:59:58Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion [54.33764537135906]
VideoQA Transformer models demonstrate competitive performance on standard benchmarks.
Do these models capture the rich multimodal structures and dynamics from video and text jointly?
Are they achieving high scores by exploiting biases and spurious features?
arXiv Detail & Related papers (2023-06-15T06:45:46Z) - Multimodal Vision Transformers with Forced Attention for Behavior
Analysis [0.0]
We introduce the Forced Attention (FAt) Transformer which utilize forced attention with a modified backbone for input encoding and a use of additional inputs.
FAt Transformers are applied to two downstream tasks: personality recognition and body language recognition.
We achieve state-of-the-art results for Udiva v0.5, First Impressions v2 and MPII Group Interaction datasets.
arXiv Detail & Related papers (2022-12-07T21:56:50Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z) - Audio-Visual Fusion Layers for Event Type Aware Video Recognition [86.22811405685681]
We propose a new model to address the multisensory integration problem with individual event-specific layers in a multi-task learning scheme.
We show that our network is formulated with single labels, but it can output additional true multi-labels to represent the given videos.
arXiv Detail & Related papers (2022-02-12T02:56:22Z) - Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval [36.50847375135979]
Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation.
We present a multi-modal, modality fusion transformer approach that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a joined multi-modal representation.
arXiv Detail & Related papers (2021-12-08T18:14:57Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.