HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with
Cross-person Memory Transformer
- URL: http://arxiv.org/abs/2305.12369v1
- Date: Sun, 21 May 2023 06:43:35 GMT
- Title: HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with
Cross-person Memory Transformer
- Authors: Yubin Kim, Dong Won Lee, Paul Pu Liang, Sharifa Algohwinem, Cynthia
Breazeal, Hae Won Park
- Abstract summary: Cross-person memory Transformer (CPM-T) framework is able to explicitly model affective dynamics.
CPM-T framework maintains memory modules to store and update the contexts within the conversation window.
We evaluate the effectiveness and generalizability of our approach on three publicly available datasets for joint engagement, rapport, and human beliefs prediction tasks.
- Score: 38.92436852096451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurately modeling affect dynamics, which refers to the changes and
fluctuations in emotions and affective displays during human conversations, is
crucial for understanding human interactions. By analyzing affect dynamics, we
can gain insights into how people communicate, respond to different situations,
and form relationships. However, modeling affect dynamics is challenging due to
contextual factors, such as the complex and nuanced nature of interpersonal
relationships, the situation, and other factors that influence affective
displays. To address this challenge, we propose a Cross-person Memory
Transformer (CPM-T) framework which is able to explicitly model affective
dynamics (intrapersonal and interpersonal influences) by identifying verbal and
non-verbal cues, and with a large language model to utilize the pre-trained
knowledge and perform verbal reasoning. The CPM-T framework maintains memory
modules to store and update the contexts within the conversation window,
enabling the model to capture dependencies between earlier and later parts of a
conversation. Additionally, our framework employs cross-modal attention to
effectively align information from multi-modalities and leverage cross-person
attention to align behaviors in multi-party interactions. We evaluate the
effectiveness and generalizability of our approach on three publicly available
datasets for joint engagement, rapport, and human beliefs prediction tasks.
Remarkably, the CPM-T framework outperforms baseline models in average
F1-scores by up to 7.3%, 9.3%, and 2.0% respectively. Finally, we demonstrate
the importance of each component in the framework via ablation studies with
respect to multimodal temporal behavior.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Evaluating Robustness of Dialogue Summarization Models in the Presence
of Naturally Occurring Variations [13.749495524988774]
We systematically investigate the impact of real-life variations on state-of-the-art dialogue summarization models.
We introduce two types of perturbations: utterance-level perturbations that modify individual utterances with errors and language variations, and dialogue-level perturbations that add non-informative exchanges.
We find that both fine-tuned and instruction-tuned models are affected by input variations, with the latter being more susceptible.
arXiv Detail & Related papers (2023-11-15T05:11:43Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - Joint-Relation Transformer for Multi-Person Motion Prediction [79.08243886832601]
We propose the Joint-Relation Transformer to enhance interaction modeling.
Our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE.
arXiv Detail & Related papers (2023-08-09T09:02:47Z) - A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face
Settings [1.9544213396776275]
We develop a probabilistic model to capture the interaction dynamics between pairs of participants in a face-to-face setting.
This interaction encoding is then used to influence the generation when predicting one agent's future dynamics.
We show that our model successfully delineates between the modes, based on their interacting dynamics.
arXiv Detail & Related papers (2022-07-10T23:31:27Z) - SMEMO: Social Memory for Trajectory Forecasting [34.542209630734234]
We present a neural network based on an end-to-end trainable working memory, which acts as an external storage.
We show that our method is capable of learning explainable cause-effect relationships between motions of different agents, obtaining state-of-the-art results on trajectory forecasting datasets.
arXiv Detail & Related papers (2022-03-23T14:40:20Z) - Modeling Intention, Emotion and External World in Dialogue Systems [14.724751780218297]
We propose a RelAtion Interaction Network (RAIN) to jointly model mutual relationships and explicitly integrate historical intention information.
The experiments on the dataset show that our model can take full advantage of the intention, emotion and action between individuals.
arXiv Detail & Related papers (2022-02-14T04:10:34Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Interactions in information spread: quantification and interpretation
using stochastic block models [3.5450828190071655]
In social networks, users' behavior results from the people they interact with, news in their feed, or trending topics.
Here, we propose a new model, the Interactive Mixed Membership Block Model (IMMSBM), which investigates the role of interactions between entities.
In inference tasks, taking them into account leads to average relative changes with respect to non-interactive models of up to 150% in the probability of an outcome.
arXiv Detail & Related papers (2020-04-09T14:22:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.