Explainable Human-centered Traits from Head Motion and Facial Expression
Dynamics
- URL: http://arxiv.org/abs/2302.09817v2
- Date: Thu, 23 Feb 2023 15:46:35 GMT
- Title: Explainable Human-centered Traits from Head Motion and Facial Expression
Dynamics
- Authors: Surbhi Madan, Monika Gahalawat, Tanaya Guha, Roland Goecke and
Ramanathan Subramanian
- Abstract summary: We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits.
We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits.
For fusing cues, we explore decision and feature-level fusion, and an additive attention-based fusion strategy.
- Score: 13.050530440884934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the efficacy of multimodal behavioral cues for explainable
prediction of personality and interview-specific traits. We utilize elementary
head-motion units named kinemes, atomic facial movements termed action units
and speech features to estimate these human-centered traits. Empirical results
confirm that kinemes and action units enable discovery of multiple
trait-specific behaviors while also enabling explainability in support of the
predictions. For fusing cues, we explore decision and feature-level fusion, and
an additive attention-based fusion strategy which quantifies the relative
importance of the three modalities for trait prediction. Examining various
long-short term memory (LSTM) architectures for classification and regression
on the MIT Interview and First Impressions Candidate Screening (FICS) datasets,
we note that: (1) Multimodal approaches outperform unimodal counterparts; (2)
Efficient trait predictions and plausible explanations are achieved with both
unimodal and multimodal approaches, and (3) Following the thin-slice approach,
effective trait prediction is achieved even from two-second behavioral
snippets.
Related papers
- Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions [0.0]
We propose a multi-task, multi-modal system that predicts categorical and dimensional emotions.
Results emphasise the importance of cross-regularisation between the two types of emotions.
arXiv Detail & Related papers (2023-12-31T16:48:03Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - A Dual-Perspective Approach to Evaluating Feature Attribution Methods [43.16453263420591]
We propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness.
Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features.
We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
arXiv Detail & Related papers (2023-08-17T12:41:04Z) - Qualitative Prediction of Multi-Agent Spatial Interactions [5.742409080817885]
We present and benchmark three new approaches to model and predict multi-agent interactions in dense scenes.
The proposed solutions take into account static and dynamic context to predict individual interactions.
They exploit an input- and a temporal-attention mechanism, and are tested on medium and long-term time horizons.
arXiv Detail & Related papers (2023-06-30T18:08:25Z) - Multimodal Fusion Interactions: A Study of Human and Automatic
Quantification [116.55145773123132]
We study how humans annotate two categorizations of multimodal interactions.
We propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition.
arXiv Detail & Related papers (2023-06-07T03:44:50Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - Head Matters: Explainable Human-centered Trait Prediction from Head
Motion Dynamics [15.354601615061814]
We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits.
Transforming head-motion patterns into a sequence of kinemes facilitates discovery of latent temporal signatures characterizing the targeted traits.
arXiv Detail & Related papers (2021-12-15T12:17:59Z) - Interactive Fusion of Multi-level Features for Compositional Activity
Recognition [100.75045558068874]
We present a novel framework that accomplishes this goal by interactive fusion.
We implement the framework in three steps, namely, positional-to-appearance feature extraction, semantic feature interaction, and semantic-to-positional prediction.
We evaluate our approach on two action recognition datasets, Something-Something and Charades.
arXiv Detail & Related papers (2020-12-10T14:17:18Z) - Pedestrian Behavior Prediction via Multitask Learning and Categorical
Interaction Modeling [13.936894582450734]
We propose a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data.
We show that our model achieves state-of-the-art performance and improves trajectory and action prediction by up to 22% and 6% respectively.
arXiv Detail & Related papers (2020-12-06T15:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.