AMII: Adaptive Multimodal Inter-personal and Intra-personal Model for
Adapted Behavior Synthesis
- URL: http://arxiv.org/abs/2305.11310v1
- Date: Thu, 18 May 2023 21:22:07 GMT
- Title: AMII: Adaptive Multimodal Inter-personal and Intra-personal Model for
Adapted Behavior Synthesis
- Authors: Jieyeon Woo, Mireille Fares, Catherine Pelachaud, Catherine Achard
- Abstract summary: Socially Interactive Agents (SIAs) are physical or virtual embodied agents that display similar behavior as human multimodal behavior.
We propose AMII, a novel approach to synthesize adaptive facial gestures for SIAs while interacting with Users and acting as a speaker or as a listener.
- Score: 6.021787236982659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Socially Interactive Agents (SIAs) are physical or virtual embodied agents
that display similar behavior as human multimodal behavior. Modeling SIAs'
non-verbal behavior, such as speech and facial gestures, has always been a
challenging task, given that a SIA can take the role of a speaker or a
listener. A SIA must emit appropriate behavior adapted to its own speech, its
previous behaviors (intra-personal), and the User's behaviors (inter-personal)
for both roles. We propose AMII, a novel approach to synthesize adaptive facial
gestures for SIAs while interacting with Users and acting interchangeably as a
speaker or as a listener. AMII is characterized by modality memory encoding
schema - where modality corresponds to either speech or facial gestures - and
makes use of attention mechanisms to capture the intra-personal and
inter-personal relationships. We validate our approach by conducting objective
evaluations and comparing it with the state-of-the-art approaches.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Nonverbal Interaction Detection [83.40522919429337]
This work addresses a new challenge of understanding human nonverbal interaction in social contexts.
We contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups.
Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form individual, group, interaction> from images.
Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs.
arXiv Detail & Related papers (2024-07-11T02:14:06Z) - Dyadic Interaction Modeling for Social Behavior Generation [6.626277726145613]
We present an effective framework for creating 3D facial motions in dyadic interactions.
The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach.
Experiments demonstrate the superiority of our framework in generating listener motions.
arXiv Detail & Related papers (2024-03-14T03:21:33Z) - AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in
Group Conversations [39.79734528362605]
Multimodal Attention Network captures cross-modal interactions at various levels of spatial abstraction.
AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level.
arXiv Detail & Related papers (2024-01-26T19:17:05Z) - Promptable Behaviors: Personalizing Multi-Objective Rewards from Human
Preferences [53.353022588751585]
We present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences.
We introduce three distinct methods to infer human preferences by leveraging different types of interactions.
We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR.
arXiv Detail & Related papers (2023-12-14T21:00:56Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Emotion-Oriented Behavior Model Using Deep Learning [0.9176056742068812]
The accuracy of emotion-based behavior predictions is statistically validated using the 2-tailed Pearson correlation.
This study is a steppingstone to a multi-faceted artificial agent interaction based on emotion-oriented behaviors.
arXiv Detail & Related papers (2023-10-28T17:27:59Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face
Settings [1.9544213396776275]
We develop a probabilistic model to capture the interaction dynamics between pairs of participants in a face-to-face setting.
This interaction encoding is then used to influence the generation when predicting one agent's future dynamics.
We show that our model successfully delineates between the modes, based on their interacting dynamics.
arXiv Detail & Related papers (2022-07-10T23:31:27Z) - Learning Graph Representation of Person-specific Cognitive Processes
from Audio-visual Behaviours for Automatic Personality Recognition [17.428626029689653]
We propose to represent the target subjects person-specific cognition in the form of a person-specific CNN architecture.
Each person-specific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function.
Experimental results show that the produced graph representations are well associated with target subjects' personality traits.
arXiv Detail & Related papers (2021-10-26T11:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.