Nonverbal Interaction Detection
- URL: http://arxiv.org/abs/2407.08133v2
- Date: Sun, 14 Jul 2024 13:33:57 GMT
- Title: Nonverbal Interaction Detection
- Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang,
- Abstract summary: This work addresses a new challenge of understanding human nonverbal interaction in social contexts.
We contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups.
Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form individual, group, interaction> from images.
Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs.
- Score: 83.40522919429337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form <individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs. Central to the model is a dual multi-scale hypergraph that adeptly addresses individual-to-individual and group-to-group correlations across varying scales, facilitating interactional feature learning and eventually improving interaction prediction. Extensive experiments on NVI show that NVI-DEHR improves various baselines significantly in NVI-DET. It also exhibits leading performance on HOI-DET, confirming its versatility in supporting related tasks and strong generalization ability. We hope that our study will offer the community new avenues to explore nonverbal signals in more depth.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations [20.848802791989307]
We introduce three new challenges to model the fine-grained dynamics between multiple people: speaking target identification, pronoun coreference resolution, and mentioned player prediction.
We propose a novel multimodal baseline that leverages densely aligned language-visual representations by synchronizing visual features with their corresponding utterances.
Experiments demonstrate the effectiveness of the proposed approach with densely aligned multimodal representations in modeling fine-grained social interactions.
arXiv Detail & Related papers (2024-03-04T14:46:58Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Learning Graph Representation of Person-specific Cognitive Processes
from Audio-visual Behaviours for Automatic Personality Recognition [17.428626029689653]
We propose to represent the target subjects person-specific cognition in the form of a person-specific CNN architecture.
Each person-specific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function.
Experimental results show that the produced graph representations are well associated with target subjects' personality traits.
arXiv Detail & Related papers (2021-10-26T11:04:23Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.