Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey
- URL: http://arxiv.org/abs/2207.10574v2
- Date: Wed, 4 Oct 2023 07:52:19 GMT
- Title: Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey
- Authors: Cigdem Beyan and Alessandro Vinciarelli and Alessio Del Bue
- Abstract summary: We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
- Score: 71.43956423427397
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automated co-located human-human interaction analysis has been addressed by
the use of nonverbal communication as measurable evidence of social and
psychological phenomena. We survey the computing studies (since 2010) detecting
phenomena related to social traits (e.g., leadership, dominance, personality
traits), social roles/relations, and interaction dynamics (e.g., group
cohesion, engagement, rapport). Our target is to identify the nonverbal cues
and computational methodologies resulting in effective performance. This survey
differs from its counterparts by involving the widest spectrum of social
phenomena and interaction settings (free-standing conversations, meetings,
dyads, and crowds). We also present a comprehensive summary of the related
datasets and outline future research directions which are regarding the
implementation of artificial intelligence, dataset curation, and
privacy-preserving interaction analysis. Some major observations are: the most
often used nonverbal cue, computational method, interaction environment, and
sensing approach are speaking activity, support vector machines, and meetings
composed of 3-4 persons equipped with microphones and cameras, respectively;
multimodal features are prominently performing better; deep learning
architectures showed improved performance in overall, but there exist many
phenomena whose detection has never been implemented through deep models. We
also identified several limitations such as the lack of scalable benchmarks,
annotation reliability tests, cross-dataset experiments, and explainability
analysis.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - Nonverbal Interaction Detection [83.40522919429337]
This work addresses a new challenge of understanding human nonverbal interaction in social contexts.
We contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups.
Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form individual, group, interaction> from images.
Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs.
arXiv Detail & Related papers (2024-07-11T02:14:06Z) - Expanding the Role of Affective Phenomena in Multimodal Interaction
Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing.
We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers.
We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z) - Automatic Context-Driven Inference of Engagement in HMI: A Survey [6.479224589451863]
This paper presents a survey on engagement inference for human-machine interaction.
It entails interdisciplinary definition, engagement components and factors, publicly available datasets, ground truth assessment, and most commonly used features and methods.
It serves as a guide for the development of future human-machine interaction interfaces with reliable context-aware engagement inference capability.
arXiv Detail & Related papers (2022-09-30T10:46:13Z) - Bodily Behaviors in Social Interaction: Novel Annotations and
State-of-the-Art Evaluation [0.0]
We present BBSI, the first set of annotations of complex Bodily Behaviors embedded in continuous Social Interactions.
Based on previous work in psychology, we manually annotated 26 hours of spontaneous human behavior.
We adapt the Pyramid Dilated Attention Network (PDAN), a state-of-the-art approach for human action detection.
arXiv Detail & Related papers (2022-07-26T11:24:00Z) - BOSS: A Benchmark for Human Belief Prediction in Object-context
Scenarios [14.23697277904244]
This paper uses the combined knowledge of Theory of Mind (ToM) and Object-Context Relations to investigate methods for enhancing collaboration between humans and autonomous systems.
We propose a novel and challenging multimodal video dataset for assessing the capability of artificial intelligence (AI) systems in predicting human belief states in an object-context scenario.
arXiv Detail & Related papers (2022-06-21T18:29:17Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.