Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot
Interaction
- URL: http://arxiv.org/abs/2308.06498v1
- Date: Sat, 12 Aug 2023 08:22:11 GMT
- Title: Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot
Interaction
- Authors: Kaiqi Chen, Jing Yu Lim, Kingsley Kuan, Harold Soh
- Abstract summary: We present a deep world model that enables a robot to perform both perception and conceptual perspective taking.
The key innovation is a multi-modal latent state space model able to generate and augment fictitious observations/emissions.
We tasked our model to predict human observations and beliefs on three partially-observable HRI tasks.
- Score: 16.19711863900126
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Perspective-taking is the ability to perceive or understand a situation or
concept from another individual's point of view, and is crucial in daily human
interactions. Enabling robots to perform perspective-taking remains an unsolved
problem; existing approaches that use deterministic or handcrafted methods are
unable to accurately account for uncertainty in partially-observable settings.
This work proposes to address this limitation via a deep world model that
enables a robot to perform both perception and conceptual perspective taking,
i.e., the robot is able to infer what a human sees and believes. The key
innovation is a decomposed multi-modal latent state space model able to
generate and augment fictitious observations/emissions. Optimizing the ELBO
that arises from this probabilistic graphical model enables the learning of
uncertainty in latent space, which facilitates uncertainty estimation from
high-dimensional observations. We tasked our model to predict human
observations and beliefs on three partially-observable HRI tasks. Experiments
show that our method significantly outperforms existing baselines and is able
to infer visual observations available to other agent and their internal
beliefs.
Related papers
- Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction [2.2120851074630177]
Uncertainty of environments has long been a difficult characteristic to handle, when performing real-world robot tasks.
This paper extended an existing predictive learning based robot control method, which employ foresight prediction using dynamic internal simulation.
The results showed that the proposed model adaptively diverged its motion through interaction with the door, whereas conventional methods failed to stably diverge.
arXiv Detail & Related papers (2024-10-01T15:13:27Z) - Multimodal Sense-Informed Prediction of 3D Human Motions [16.71099574742631]
This work introduces a novel multi-modal sense-informed motion prediction approach, which conditions high-fidelity generation on two modal information.
The gaze information is regarded as the human intention, and combined with both motion and scene features, we construct a ternary intention-aware attention to supervise the generation.
On two real-world benchmarks, the proposed method achieves state-of-the-art performance both in 3D human pose and trajectory prediction.
arXiv Detail & Related papers (2024-05-05T12:38:10Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - Robot Learning Theory of Mind through Self-Observation: Exploiting the
Intentions-Beliefs Synergy [0.0]
Theory of Mind (TOM) is the ability to attribute to other agents' beliefs, intentions, or mental states in general.
We show the synergy between learning to predict low-level mental states, such as intentions and goals, and attributing high-level ones, such as beliefs.
We propose that our architectural approach can be relevant for the design of future adaptive social robots.
arXiv Detail & Related papers (2022-10-17T21:12:39Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Probabilistic Human Motion Prediction via A Bayesian Neural Network [71.16277790708529]
We propose a probabilistic model for human motion prediction in this paper.
Our model could generate several future motions when given an observed motion sequence.
We extensively validate our approach on a large scale benchmark dataset Human3.6m.
arXiv Detail & Related papers (2021-07-14T09:05:33Z) - Deep Interpretable Models of Theory of Mind For Human-Agent Teaming [0.7734726150561086]
We develop an interpretable modular neural framework for modeling the intentions of other observed entities.
We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft.
arXiv Detail & Related papers (2021-04-07T06:18:58Z) - Careful with That! Observation of Human Movements to Estimate Objects
Properties [106.925705883949]
We focus on the features of human motor actions that communicate insights on the weight of an object.
Our final goal is to enable a robot to autonomously infer the degree of care required in object handling.
arXiv Detail & Related papers (2021-03-02T08:14:56Z) - Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs [90.20235972293801]
Aiming to understand how human (false-temporal)-belief-a core socio-cognitive ability unify-would affect human interactions with robots, this paper proposes to adopt a graphical model to the representation of object states, robot knowledge, and human (false-)beliefs.
An inference algorithm is derived to fuse individual pg from all robots across multi-views into a joint pg, which affords more effective reasoning inference capability to overcome the errors originated from a single view.
arXiv Detail & Related papers (2020-04-25T23:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.