Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in
Group Conversations
- URL: http://arxiv.org/abs/2304.12204v1
- Date: Wed, 19 Apr 2023 20:23:11 GMT
- Title: Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in
Group Conversations
- Authors: Dong Won Lee, Yubin Kim, Rosalind Picard, Cynthia Breazeal, Hae Won
Park
- Abstract summary: We propose the Multiparty-Transformer (Multipar-T), a transformer model for multiparty behavior modeling.
The core component of our proposed approach is the Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people.
We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark.
- Score: 25.305521223925428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As we move closer to real-world AI systems, AI agents must be able to deal
with multiparty (group) conversations. Recognizing and interpreting multiparty
behaviors is challenging, as the system must recognize individual behavioral
cues, deal with the complexity of multiple streams of data from multiple
people, and recognize the subtle contingent social exchanges that take place
amongst group members. To tackle this challenge, we propose the
Multiparty-Transformer (Multipar-T), a transformer model for multiparty
behavior modeling. The core component of our proposed approach is the
Crossperson Attention, which is specifically designed to detect contingent
behavior between pairs of people. We verify the effectiveness of Multipar-T on
a publicly available video-based group engagement detection benchmark, where it
outperforms state-of-the-art approaches in average F-1 scores by 5.2% and
individual class F-1 scores by up to 10.0%. Through qualitative analysis, we
show that our Crossperson Attention module is able to discover contingent
behavior.
Related papers
- DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation [42.87704953679693]
Engagement estimation plays a crucial role in understanding human social behaviors.
We propose a Dialogue-Aware Transformer framework that relies solely on audio-visual input and is language-independent.
Our approach achieves a CCC score of 0.76 on the NoXi Base test set and an average CCC of 0.64 across the NoXi Base, NoXi-Add, and MPIIGI test sets.
arXiv Detail & Related papers (2024-10-11T02:43:45Z) - MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large
Language Models [70.92847554971065]
We introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abilities.
By analyzing human-LLM conversations, we categorize interaction patterns into four types: recollection, expansion, refinement, and follow-up.
Our evaluation of 11 well-known LLMs shows that while closed-source models generally surpass open-source ones, certain open-source models exceed GPT-3.5-Turbo in specific tasks.
arXiv Detail & Related papers (2024-01-30T04:50:28Z) - AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in
Group Conversations [39.79734528362605]
Multimodal Attention Network captures cross-modal interactions at various levels of spatial abstraction.
AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level.
arXiv Detail & Related papers (2024-01-26T19:17:05Z) - Multilevel Transformer For Multimodal Emotion Recognition [6.0149102420697025]
We introduce a novel multi-granularity framework, which combines fine-grained representation with pre-trained utterance-level representation.
Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-26T10:31:24Z) - Rethinking Trajectory Prediction via "Team Game" [118.59480535826094]
We present a novel formulation for multi-agent trajectory prediction, which explicitly introduces the concept of interactive group consensus.
On two multi-agent settings, i.e. team sports and pedestrians, the proposed framework consistently achieves superior performance compared to existing methods.
arXiv Detail & Related papers (2022-10-17T07:16:44Z) - The Minority Matters: A Diversity-Promoting Collaborative Metric
Learning Algorithm [154.47590401735323]
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems.
This paper focuses on a challenging scenario where a user has multiple categories of interests.
We propose a novel method called textitDiversity-Promoting Collaborative Metric Learning (DPCML)
arXiv Detail & Related papers (2022-09-30T08:02:18Z) - Dual-AI: Dual-path Actor Interaction Learning for Group Activity
Recognition [103.62363658053557]
We propose a Dual-path Actor Interaction (DualAI) framework, which flexibly arranges spatial and temporal transformers.
We also introduce a novel Multi-scale Actor Contrastive Loss (MAC-Loss) between two interactive paths of Dual-AI.
Our Dual-AI can boost group activity recognition by fusing distinct discriminative features of different actors.
arXiv Detail & Related papers (2022-04-05T12:17:40Z) - Group Gated Fusion on Attention-based Bidirectional Alignment for
Multimodal Emotion Recognition [63.07844685982738]
This paper presents a new model named as Gated Bidirectional Alignment Network (GBAN), which consists of an attention-based bidirectional alignment network over LSTM hidden states.
We empirically show that the attention-aligned representations outperform the last-hidden-states of LSTM significantly.
The proposed GBAN model outperforms existing state-of-the-art multimodal approaches on the IEMOCAP dataset.
arXiv Detail & Related papers (2022-01-17T09:46:59Z) - Abstractive Sentence Summarization with Guidance of Selective Multimodal
Reference [3.505062507621494]
We propose a Multimodal Hierarchical Selective Transformer (mhsf) model that considers reciprocal relationships among modalities.
We evaluate the generalism of proposed mhsf model with the pre-trained+fine-tuning and fresh training strategies.
arXiv Detail & Related papers (2021-08-11T09:59:34Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.