Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings
- URL: http://arxiv.org/abs/2405.03846v1
- Date: Mon, 6 May 2024 20:51:28 GMT
- Title: Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings
- Authors: Ádám Fodor, Rachid R. Saboundji, András Lőrincz,
- Abstract summary: We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings.
Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant.
- Score: 0.5461938536945723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic personality trait assessment is essential for high-quality human-machine interactions. Systems capable of human behavior analysis could be used for self-driving cars, medical research, and surveillance, among many others. We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings and exploiting modality invariant embeddings. Acoustic, visual, and textual information are utilized to reach high-performance solutions in this task. Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant. Our proposed method addresses the challenge of under-represented extreme values, achieves 0.0033 MAE average improvement, and shows a clear advantage over the baseline multimodal DNN without the introduced module.
Related papers
- Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models [0.8431877864777444]
Rodents employ a broad spectrum of ultrasonic vocalizations (USVs) for social communication.
Here, we present the first systematic evaluation of different types of neural networks for USV classification.
arXiv Detail & Related papers (2024-05-17T07:46:05Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions [11.972017738888825]
We propose Model Autophagy Analysis (MONAL) for large models' self-consumption explanation.
MONAL employs two distinct autophagous loops to elucidate the suppression of human-generated information in the exchange between human and AI systems.
We evaluate the capacities of generated models as both creators and disseminators of information.
arXiv Detail & Related papers (2024-02-17T13:02:54Z) - Improving Neural Additive Models with Bayesian Principles [54.29602161803093]
Neural additive models (NAMs) enhance the transparency of deep neural networks by handling calibrated input features in separate additive sub-networks.
We develop Laplace-approximated NAMs (LA-NAMs) which show improved empirical performance on datasets and challenging real-world medical tasks.
arXiv Detail & Related papers (2023-05-26T13:19:15Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image
Domains [22.92165116962952]
In-bed human posture estimation provides important health-related metrics with potential value in medical condition assessments.
We propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training.
We demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models.
arXiv Detail & Related papers (2021-11-30T04:56:16Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Introducing Representations of Facial Affect in Automated Multimodal
Deception Detection [18.16596562087374]
Automated deception detection systems can enhance health, justice, and security in society.
This paper presents a novel analysis of the power of dimensional representations of facial affect for automated deception detection.
We used a video dataset of people communicating truthfully or deceptively in real-world, high-stakes courtroom situations.
arXiv Detail & Related papers (2020-08-31T05:12:57Z) - An Uncertainty-based Human-in-the-loop System for Industrial Tool Wear
Analysis [68.8204255655161]
We show that uncertainty measures based on Monte-Carlo dropout in the context of a human-in-the-loop system increase the system's transparency and performance.
A simulation study demonstrates that the uncertainty-based human-in-the-loop system increases performance for different levels of human involvement.
arXiv Detail & Related papers (2020-07-14T15:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.