MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations
- URL: http://arxiv.org/abs/2407.07636v2
- Date: Sun, 13 Oct 2024 18:57:40 GMT
- Title: MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations
- Authors: Vignesh Prasad, Alap Kshirsagar, Dorothea Koert, Ruth Stock-Homburg, Jan Peters, Georgia Chalvatzaki,
- Abstract summary: We propose a novel approach for learning a shared latent space representation for Human-Robot Interaction (HRI)
We train a Variational Autoencoder (VAE) to learn robot motions regularized using an informative latent space prior.
We find that our approach of using an informative MDN prior from human observations for a VAE generates more accurate robot motions.
- Score: 19.184155232662995
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Shared dynamics models are important for capturing the complexity and variability inherent in Human-Robot Interaction (HRI). Therefore, learning such shared dynamics models can enhance coordination and adaptability to enable successful reactive interactions with a human partner. In this work, we propose a novel approach for learning a shared latent space representation for HRIs from demonstrations in a Mixture of Experts fashion for reactively generating robot actions from human observations. We train a Variational Autoencoder (VAE) to learn robot motions regularized using an informative latent space prior that captures the multimodality of the human observations via a Mixture Density Network (MDN). We show how our formulation derives from a Gaussian Mixture Regression formulation that is typically used approaches for learning HRI from demonstrations such as using an HMM/GMM for learning a joint distribution over the actions of the human and the robot. We further incorporate an additional regularization to prevent "mode collapse", a common phenomenon when using latent space mixture models with VAEs. We find that our approach of using an informative MDN prior from human observations for a VAE generates more accurate robot motions compared to previous HMM-based or recurrent approaches of learning shared latent representations, which we validate on various HRI datasets involving interactions such as handshakes, fistbumps, waving, and handovers. Further experiments in a real-world human-to-robot handover scenario show the efficacy of our approach for generating successful interactions with four different human interaction partners.
Related papers
- Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios [3.8625803348911774]
We present an experimental evaluation of ROS-Causal, a ROS-based framework for causal discovery in human-robot spatial interactions.
We show how causal models can be extracted directly onboard by robots during data collection.
The online causal models generated from the simulation are consistent with those from lab experiments.
arXiv Detail & Related papers (2024-06-07T14:20:30Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [55.65482030032804]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - Learning Multimodal Latent Dynamics for Human-Robot Interaction [19.803547418450236]
This article presents a method for learning well-coordinated Human-Robot Interaction (HRI) from Human-Human Interactions (HHI)
We devise a hybrid approach using Hidden Markov Models (HMMs) as the latent space priors for a Variational Autoencoder to model a joint distribution over the interacting agents.
We find that Users perceive our method as more human-like, timely, and accurate and rank our method with a higher degree of preference over other baselines.
arXiv Detail & Related papers (2023-11-27T23:56:59Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - MILD: Multimodal Interactive Latent Dynamics for Learning Human-Robot
Interaction [34.978017200500005]
We propose Multimodal Interactive Latent Dynamics (MILD) to address the problem of two-party physical Human-Robot Interactions (HRIs)
We learn the interaction dynamics from demonstrations, using Hidden Semi-Markov Models (HSMMs) to model the joint distribution of the interacting agents in the latent space of a Variational Autoencoder (VAE)
MILD generates more accurate trajectories for the controlled agent (robot) when conditioned on the observed agent's (human) trajectory.
arXiv Detail & Related papers (2022-10-22T11:25:11Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Learning Whole-Body Human-Robot Haptic Interaction in Social Contexts [11.879852629248981]
This paper presents a learning-from-demonstration (LfD) framework for teaching human-robot social interactions that involve whole-body haptic contact over the full robot body.
The performance of existing LfD frameworks suffers in such interactions due to high dimensionality data sparsity.
We show that by leveraging this sparsity, we can reduce the data dimensionality without incurring a significant accuracy penalty, and introduce three strategies for doing so.
arXiv Detail & Related papers (2020-05-26T03:44:09Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.