BOSS: A Benchmark for Human Belief Prediction in Object-context
Scenarios
- URL: http://arxiv.org/abs/2206.10665v1
- Date: Tue, 21 Jun 2022 18:29:17 GMT
- Title: BOSS: A Benchmark for Human Belief Prediction in Object-context
Scenarios
- Authors: Jiafei Duan, Samson Yu, Nicholas Tan, Li Yi, Cheston Tan
- Abstract summary: This paper uses the combined knowledge of Theory of Mind (ToM) and Object-Context Relations to investigate methods for enhancing collaboration between humans and autonomous systems.
We propose a novel and challenging multimodal video dataset for assessing the capability of artificial intelligence (AI) systems in predicting human belief states in an object-context scenario.
- Score: 14.23697277904244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans with an average level of social cognition can infer the beliefs of
others based solely on the nonverbal communication signals (e.g. gaze, gesture,
pose and contextual information) exhibited during social interactions. This
social cognitive ability to predict human beliefs and intentions is more
important than ever for ensuring safe human-robot interaction and
collaboration. This paper uses the combined knowledge of Theory of Mind (ToM)
and Object-Context Relations to investigate methods for enhancing collaboration
between humans and autonomous systems in environments where verbal
communication is prohibited. We propose a novel and challenging multimodal
video dataset for assessing the capability of artificial intelligence (AI)
systems in predicting human belief states in an object-context scenario. The
proposed dataset consists of precise labelling of human belief state
ground-truth and multimodal inputs replicating all nonverbal communication
inputs captured by human perception. We further evaluate our dataset with
existing deep learning models and provide new insights into the effects of the
various input modalities and object-context relations on the performance of the
baseline models.
Related papers
- A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation [39.87346821309096]
We present an addressee estimation model with improved performance in comparison with the previous SOTA.
We also propose several ways to incorporate explainability and transparency in the aforementioned architecture.
arXiv Detail & Related papers (2024-05-20T13:09:32Z) - Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task [17.190635800969456]
Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions.
We introduce a hierarchical approach for interpreting user non-verbal cues, like hand gestures, body poses, and facial expressions.
Our evaluation demonstrates the potential of LLMs to interpret non-verbal cues and to combine them with their context-understanding capabilities.
arXiv Detail & Related papers (2024-04-12T12:15:14Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [55.65482030032804]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - Social Motion Prediction with Cognitive Hierarchies [19.71780279070757]
We introduce a new benchmark, a novel formulation, and a cognition-inspired framework.
We present Wusi, a 3D multi-person motion dataset under the context of team sports.
We develop a cognitive hierarchy framework to predict strategic human social interactions.
arXiv Detail & Related papers (2023-11-08T14:51:17Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.